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A STUDY OF THE MENTAL GROWTH CURVE WITH 
SPECIAL REFERENCE TO THE RESULTS OF 
GROUP INTELLIGENCE TESTS 


CHARLES LEONARD ODOM 


Centenary College of Louisiana 


The problems with which this study has to deal may be stated in the 
following questions: , 

1. What is the shape of the mental growth curve when plotted from 
the results of group intelligence tests? 

2. What is the relation between chronological age and absolute 
variability? 

3. What is the upper limit of the growth of test-intelligence? 

The method consists in the scaling of a number of standardized 
group intelligence tests and in plotting the growth curves obtained 
from this scaling. The scaling method used was devised by Thurstone.* 

In this scaling method the mean test performance of any one of the 
age groups is chosen as an origin and the standard deviation of any one 
of the age groups is chosen as a unit of measurement. In the present 
study the mean test performance of the lowest age group represented in 
the norms was designated the origin, and the standard deviation in test 
ability of that age group was designated the unit-of measurement. The 
scaling method makes it possible to construct the normal distributions of 
ability of the successive age groups on the same base line with a common 
origin and unit of measurement. Before the scale is constructed the 
data must satisfy a statistical criterion to ascertain whether each pair 
of adjacent age groups can both be represented as normal distributions 
on the same base line.. When the means of these successive age groups 
are plotted against chronological age we have a mental growth curve 
with the assumed origin as a datum. While the true ordinates of a 


mental growth curve can not be ascertained without an absolute zero, it © ° 
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is nevertheless possible to compare legitimately the increments between 
the means of the successive age groups. 

The validity of this method of scaling the tests was shown by 
Thurstone in his article previously referred to. He showed that for 
Cyril Burt’s data the distributions of two adjacent age groups can be 
represented as normal on the same abstract base line. 

In the present study the following tests were investigated: 

1. Dearborn Scale, Series I and II.—The data for these tests were 
gathered in small Massachusetts towns. The age-score distributions 
are shown in Tables I and II. 

2. Otis Group Scale, Primary and Advanced.—The data for these 
tests were taken directly from the Manual of Directions? for the 
Primary and Advanced Examinations, Third Revision. In all, the 
norms include pupils from about two hundred cities throughout 
the country. The ages represented and the numbers tested at each 
age are shown in Table V. 3 

3. Illinois Group Intelligence Scale-—This test was given to two 
groups of children, one in the elementary schools of Chicago, the other 
in the elementary schools of Bloomington, Illinois. According to the 
norms furnished, the children in the Chicago schools were somewhat 
below normal in intelligence, while those in the Bloomington schools 


TaB.Le I:—DistTrRisuTions oF SCORES IN THE DEARBORN SCALE, Series ['! Data 
on ScHoot CHILDREN IN MassAcHusEetTts Towns 



















































































Scores 
| 

Ages 0 | 10/20/30} 40) 50} 60) 70) 80/90) 100) 110) 120) 130) 140) 150) 160) 170) 180) 190 
to} to} to! to} to! to! to} to|to to) to | to | to| to | to | to | to| to | to| to | Totals 

9 | 19| 29) 39| 49) 59|69|79/89'99 100) 119 129) 139) 149) 159) 169) 179) 189) 199 

| { 

5to 5-11 /22/50/41/18/13, 9' 45 3} Oo OF OF OF oO oO oO OW Oo Oa OO 160 
6to 6-11 15|25| 50) 53/}52|54|26'29/15)11) 4 2) 2) OF 1) OF OF OF OF O 339° 
7 to 7-11 4) 3|17|30|30/39|40|53|58\40; 41; 31) 11; 15; 10) 3) O} OF} OF O 425 
8to 8-11 2} 1; 1) 7|14)17|33|/29|36/47| 39) 53) 32) 26) 18) 21; 4) O| OF O 390 
9to 9-11 0} 1) 1) 3) 6) 9)10)21/17/26) 33) 39) 45) 53) 36) 24; 8) 4 1] O 337 
10 to 10-11 0} O} O} 2| 1) 3} 1) 8} 611} 29) 33) 29) 39) 57) 29) 19) 14 5 O 286 
11 to 11-11 0} 0} O} 1; OF OF 1) 1) BI 6} 7! «9 22) 22) 30) 50; 37; 18; 7) O 216 
12 to 12-11 0} 0; O} 1} O} OF O| O} 4 2} 3) 4) 14) 28) 28) 41) 33) 34) 4) 2 198 
13 to 13-11 Bee oe a es he le le Bs a Fe 6} 14, 8 9 2 59 
as el ina dhe hie 6 See wo TE Ree Oa RUE NGS LaabE ee Kenn 2,410 





1 The distributions in Tables 1 and 11 were furnished the writer by Dr. Edward A. Lincoln of the 
Psycho-educational Clinic at Harvard University. 
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Tas.Le IIJ].—Duisrrievtions or SCORES IN THE ILLINOIS INTELLIGENCE SCALE A! 
Data ON CHILDREN IN THE CarIcaGo PusLic SCHOOLS 





































































































Scores 

0 sha pF mE wcllas 100) 110) 120) 130) 140) 150) 160) 170; 180 
Ages to to| to} to | to | to | to! to | to| to) to | to | to | to | to | to | to | to | to | Totals 

9 19) 39 | 49 | 59 |69| 79 a bg 109/119 129/139 a st 169/179) 189 

} 
8to 8-11 0} 6|17| 29; 24, 11) 4) 4) 6& 1 ‘GENES Bae 102 
9to 9-11 2:17|46; 87| 88; 50/46) 15) 9) 4) O}...)...)...]...]..-]..0}---]--- 373 
10 to 10-11 5/16/52) 103) 135/ 109/65) 41/16) 8 2| = = & ee ea ae 557 
11 to 11-11 4\16/41| 73| 95\|128|94' 69/43\32) 13, 9 1, OF 3 1 OL. 622 
12 to 12-11 2) 8138) 64; 84) 94/96) 92/71/59) 35; 19, 10; 6) OF 2 O}...)}... 680 
13 to 13-11 3|15|30| 64| 84) 97/93)106/87/91| 50; 29); 12) 7) 4) 1) OF OO 1 774 
14 to 14-11 2| 619) 37; 31) 39'68) 57|68'37 25} 12' si 2) 2} 1' of Of O 414 
15 to 15-11 2} 3); 2) 16] 10 15/13 21/1414 7 6 a CtlC HC H—«SsCOFHSCOUdYCSCO 115 
ee 

ESSIEN NOLEN IO EB LG A EET MEET RE BAD TEAS Oo a RC ae 3,637 





1 The distributions of scores for theee data were furnished the writer by Dr. C. W. Odell of the 
Bureau of Educational Research at the University of Lilinois. 
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TaBLE V.—MEan Scores In TERMS OF THE STANDARD DEVIATION OF THE LOWEST 
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AcE Group 
Dearborn Group Scale, Series I 

Number 

Age Mean o +le —lo od ada 
5 0.000 1.000 1.000 —1.000 160 
6 1.072 1.165 2.327 —0.093 339 
7 2.394 1.292 3.687 1.102 425 
8 3.315 1.415 4.731 1.900 390 
9 3.955 1.485 5.441 2.469 337 
10 4.849 1.606 6.445 3.242 286 
11 5.321 1.554 6.875 3.767 216 
12 5.944 1.622 7.566 4.322 198 
13 6.890 2.050 8.941 4.839 59 
RRR TERED eR PURI Se I AS RAT Ge Rea ee eR eee 2,410 

Dearborn Group Scale, Series II 

Number 

Age Mean o +loe —le ol cihie 
7 0.000 1.000 1.000 —1.000 233 
8 1.369 1.231 2.600 —0.138 408 
9 2.241 1.276 3.517 0.964 469 
10 2.990 1.357 4.348 1.632 435 
11 3.748 1.425 5.174 2.323 465 
12 4.402 1.553 5.959 2.849 471 
13 5.062 1.774 6.836 3.228 497 
14 5.734 1.798 7.522 3.935 440 
15 6.329 1.757 8.086 4.571 403 
16 6.767 1.959° 8.726 4.808 253 
ii hil aa Aaa wtb hid ako hes oie OEE eA A OE Eee oe ed 4,074 
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TaBLeE V.—Continued 
Otis Advanced Group Scale 









































Number 

Age | Mean o +1lo | —loe pms 
8 0.000 1.000 1.000 —1.000 720 
9 0.226 0.970 1.197 —0.742 3,073 
10 0.440 1.068 1.508 —0.628 4,445 
| i 0.941 1.092 2.033 —0.151 5,879 
12 1.236 1.154 2.391 0.082 6,648 
13 1.606 1.222 2.828 0.384 6,477 
14 2.240 1.441 3.682 0.799 5,307 
15 2.627 1.552 4.180 1.075 3,391 
16 3.307 | 1.563 4.960 1.834 1,971 
17 3.966 | 1.317 5.184 2.549 1,248 
18 3.857 | 1.338 5.195 2.519 651 
19 3.815 | 1.348 | 5.164 2.467 250 
ER ancy 2 a UR SAE GRP OS Pear AE 40 ,040 

Otis Primary Group Scale 
Number n 

Age Mean o +le —le fsa 
6 0.000 1.000 1.000 —1.000 428 
7 1.005 1.192 2.197 —0.187 501 
8 1.937 1.181 3.118 0.756 537 
9 2.723 1.243 3.966 1.480 542 
10 3.510 1.436 4.946 2.073 546 
11 4.076 1.800 5.885 2.267 262 
12 ¥ 6 205 
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TaBLeE V.—Continued 
National Intelligence Scale A 

Number 

Age Mean o +1le —loe By ssa 
8 0.000 1.000 1.000 —1.000 2,116 
9 0.512 1.162 1.647 —0.650 4,016 
10 1.082 1.269 2.351 —0.187 5 ,028 
1} 1.711 1.371 3.083 0.339 5,217 
12 2.418 1.533 3.951 0.885 5,330 
13 3.028 1.633 4.661 1.394 5,599 
14 3.761 1.806 5.568 1.955 4,542 
15 4.040 1.818 5.858 2.221 524 
Ns pac e0C Kids oc ee A eWeek ane RA Oss A ke eae xc ae 32 ,372 

Illinois General Scale—Chicago Data 

Number 

Age Mean o +1e —loe at aeaee 
8 0.000 1.000 1.000 —1.000 102 
9 0.140 0.955 1.095 —0.814 373 
10 0.368 0.978 1.347 —0.610 557 
11 0.618 1.102 1.720 —0.484 622 
12 0.678 1.179 1.858 —0.500 680 
13 0.788 1.239 2.027 —0.450 774 
14 0.786 1.218 2.004 —0.432 414 
15 0.756 1.283 2.039 —0.526 115 
DE Fab loies breach ye ee se ak Cee ee eee. 3,637 

Illinois General Scale—Bloomington Data 
Number 
Age Mean o +le —le of cases 

8 1.000 1.000 1.000 —1.000 242 
9 0.737 1.258 1.995 —0.521 329 
10 1.315 1.362 2.678 —0.041 353 
11 2.368 1.621 3.989 0.747 330 
12 2.929 1.645 4.574 1.284 355 
13 3.382 1.728 5.120 1.653 325 
Ses vba bk oth eked deg VRE eee had ond eee eta 1,932 
Total number cases in the seven scales.....................-.008- 87 ,486 
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were slightly above normal. The age score distributions of both groups 
are shown in Tables ITI and IV. 

4. National Intelligence Scale A.—The data for this test were taken 
directly from Supplement No. 3 to the Manual of Directions. They 
were gathered from nineteen different communities. The ages repre- 
sented and the number tested at each age are shown in Table V. 

Table V summarizes the results. Column 1 gives the chronolog- 
ical ages of the groups tested; column 2 gives the means on the scale 
with unit and origin as described; column 3 gives the standard deviation 
for each age group; column 4 shows the scale value for +1¢ for each age 
group; column 5 shows the scale value of —1¢ for each age group, and 
column 6 gives the number of children tested in each age group. 











Age 7 8 9 HN 2 3 45 6 
Mean 6] 136 224 29 3.7 44 50 576367 
fom 1 1.23 1.2713 14 15 UL? t7 17 19 


Fie. 1.—Curves showing the distribution of Dearborn, Series II, test-intelligence. 
Data on 4074 Massachusetts school children. 


Figure 1 shows a series of frequency distribution curves for the Dear- 
born Scale, Series II. Similar diagrams for the other test may readily 
be drawn for the other data of Table V. In drawing a diagram such 
as Figure 1 only the first three columns of Table V are needed. The 
first probability curve is drawn artibrarily. The others are then 
plotted with reference to the first by the means and the standard 
deviations listed in Table V. 

Figures 2 to 8 inclusive show‘the mental growth curves for each 
set of data. The middle curve in each chart shows the mean growth 
of test-intelligence. The upper curves show the growth of test-intelli- 
gence of those children who rank +1¢ as compared to children of their 
own chronological age. The lower curves show the growth of test- 
intelligence of those children who rank —1o as compared to children of 
their own chronological age. - 

These curves were not combined into a composite growth curve 
because it is not certain that the several tests measure the same function. 
Some test abilities may conceivably mature earlier than others. 
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By inspection of Figure 2 it may be observed that the curves show 
a negative acceleration to age sixteen. They are remarkably smooth 
indicating a gradual growth in mental ability of diminishing rate. 
The same degree of negative acceleration and smoothness are shown in 
all but two of these growth curves. The growth curve for the National 
Seale A (Figure 6) is practically linear to age fifteen. The Otis Advanced 
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Fic. 2.—Mental growth curves showing the mean intelligence of children in succes- 
sive ages on the Dearborn Scale, Series IJ. The upper curve shows the growth in 
intelligence of children ranking +1¢ with reference to children of their own age. The 


lower curve shows the growth in intelligence of children ranking —1o with reference to 
children of their own age. 


Group Scale differs markedly from the other curves in that it starts 
with a positive acceleration to age thirteen of fourteen followed by 
negative acceleration. The peculiarities of the mental growth curve 
for the Otis Advanced Examination are explained by the author in his 
Manual of Directions.? Otis discards the norms for ages eight and nine 
and also for the ages above fifteen. (See pages 55 and 62 of his manual.) 


The remaining part of the data which the author himself accepts reveal 
a linear growth curve. 
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The shape of these curves, with the exceptions noted, is in agreement 
with the widely accepted theory that growth in intelligence proceeds at a 
negatively accelerated rate. 

An examination of the literature dealing with variability and 
chronological age reveals considerable disagreement. For example, 
Freeman,' using raw scores to plot growth curves from the results of the 
National Intelligence Scale A and the Otis Advanced Group Scale 
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Fic. 3.— Mental growth curves showing the mean intelligence of children in succes- 
sive ages on the Dearborn Scale, Series I. The upper curve shows the growth in intelli- 
gence of children ranking +1¢ with reference to children of their own age. The lower 
curve shows the growth in intelligence of children ranking —1e with reference to children 
of their own age. 


found that the median and the twenty-fifth and seventy-fifth percen- 
tiles were parallel, indicating a constant variability with an increase in 
chronological age. If this deduction were true, it would imply that the 
variability araong children at one age is the same as it is among children 
at all other ages. On the other hand, Thurstone,’ using his scaling 
method with a constant unit of measurement, plotted growth curves for 
Burt’s data on three thousand London school children who had 
been given the Binet tests. His mean and upper and lower sigma curves 
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diverge markedly as they progress from lower to higher ages, indicating 
that variability increases with increase in chronological age. 

The position taken by the writer is that raw scores can not be used as 
direct measurements of mental development because equal differences 
in scores at different levels can not be assumed to represent equal 
increments of mental development. For example, there can be no 
guarantee whatever that the increment in mental ability between 
scores of ten and fifteen on any test is the same as the increment of 
mental ability between ninety and ninety-five. For this reason the 
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Fie. 4.—Mental growth curves showing the mean intelligence of children in succes- 
sive ages on the Otis Primary Group Intelligence Scale. The upper curve shows the 
growth in intelligence of children ranking +1¢ with reference to children of their own 
age. The lower curve shows the growth in intelligence of children ranking —lo with 
reference to children of their own age. 


study of mental growth curves and of variability in mental development - 
for different age levels is entirely useless in terms of raw scores. 

The absolute scale that we have used in the present study is more 
valid than raw scores as measures of mental growth because it can be 
shown that equal increments in different parts of the scale represent 
equal increments in mental development on the assumption that the 
distribution of mental ability is normal for a large sampling of children 
at each age level. The latter assumption is justified in part when the 


several age groups can be represented as normal-distributions on the 
same or common scale. 
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The reader will observe that in all the growth curves of the present 
study the upper and lower curves diverge markedly as they proceed 
from lower to higher ages indicating that absolute variability increases 
with chronological age. This fact is in aggrement with the results of 
Thurstone’s study. 

This conclusion will probably have some bearing upon the con- 
struction of future educational and psychological scales. The Woody 
arithmetic scale and the Trabue language scales are constructed 
on the assumption that variability is constant for the different ages. 
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Fic. 5.—Mental growth curves showing the mean intelligence of children in succes- 
sive ages on the Otis Advanced Group Intelligence Scale. The upper curve shows the 
growth in intelligence of children ranking +1¢ with reference to children of their own 
age. The lower curve shows the growth in intelligence of children ranking —1lo with 
reference to children of their own age. 


Our results indicate that since variability is not constant the scaling 
methods should be revised. It is evident that test material for the 
higher ages should cover a wider range as we proceed from lower to 
higher ages. 

Inspection of the growth curves in Figures 2 to 8 inclusive reveals no 
general tendency toward an asymptote at ages thirteen or fourteen, 
or even fifteen or sixteen. Reference has already been made to the 
pecularities of the growth curves of Figure 5. But even if we discard all 
the data above age fifteen because of selection, it is evident that test- 
intelligence is still growing at this age. In the other figures we find 
the curves still rising at the highest age levels given in the norms; 
in Figure 2 at age sixteen, in Figure 3 at age thirteen, in Figure 4 
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: Fic. 6.—Mental growth curves showing the mean intelligence of children in succes- 
sive ages on the National Intelligence Scale A. The upper curve shows the growth in 
intelligence of children ranking +1o¢ with reference to children of their own age. The 


lower curve shows the growth in intelligence of children ranking —1¢ with reference to 
children of their own age. 
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Fics. 7 and 8.—Mental growth curves showing the mean intelligence of children in 
successive ages on the Illinois Intelligence Scale. The upper curves show growth in 


intelligence of children ranking +-1o¢ with reference to children of their own age. The 


lower curve shows the growth in intelligence of children ranking —1o¢ with reference to 
children of their own age. 
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at age twelve, in Figure 6 at age fifteen, and in Figure 7 at age thirteen, 
all with no marked tendency to reach a level. 

There is one exception to the statement made previously that 
there is no general tendency toward an asymptote at ages thirteen to 
sixteen. Figure 8 shows curves that have apparently reached a level 
at age thirteen. If we compare the curves of Figure 7 with those of 
Figure 8, we observe that in Figure 7 there is no tendency for the curves 
to reach an adult level at age thirteen. Since both sets of curves were 
plotted from data obtained from the same test, it cannot be argued that 
the great difference in the absolute amounts of gain shown by the 
two sets of curves is due to differences in the tests used. 

From the characterization of the original data used in plotting 
these two sets of curves it is known that the group of children from the 
Chicago schools is, on the average, inferior to the group from Blooming- 
ton. Since we know this fact we are probably justified in inferring 
that the difference referred to is due to a difference in degree of mean 
test-intelligence of the two groups. This deduction is an agreement 
with the theory that dull children deyelop more slowly than normal 
children and reach their adult level of growth at an earlier age. 


SUMMARY AND CONCLUSIONS 


1. The mental growth curve is shown generally to be negatively 
accelerated when plotted from the results of group intelligence tests. 
Occasionally a curve will be slightly positively accelerated. Occasion- 
ally a curve will approximate a straight line. 

2. Absolute variability increases as chronological age increases. 
This statement holds in the case of every test scaled in this study. 

3. The ability of children to score on group intelligence tests does 
not stop growing before the age of seventeen and very likely not until 
a later age. 

4. Further investigation is nectSsary to discover the upper limit 
for intelligence growth; studies should be made in which adequate 
representation is maintained at the higher ages and where the factor of 
selection is eliminated. Such a study\might be one in which a large 
number of children are tested at an early age and retested annually 
until the growth curve reaches a level. 
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A METHOD FOR CORRECTING COEFFICIENTS OF 
CORRELATIONS FOR ey igimmaaaeoeied IN THE 
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Since 1896, when Pearson published his product-moment formula, 
millions of correlations have been computed between hundreds of 
variables. Yet in the field of social and mental measurements not 
a single relationship has been definitely established even within the 
limits of four times its probable error. While it is true that the 
correlations between certain variables are known to be positive and 
low, and between other variables positive and high, yet not one 
of these is definitely established with anything like the accuracy of 
the mathematical constants of the more exact sciences. One reason 
for this is that in such sciences as physics and astronomy the variables 
are more definite, the instruments of measurement are more precise, 
and the data more homogeneous. 

Let anyone attempt to collect all the measurements that have been 
made on any two social or mental variables, and also collect all the 
correlations that have been computed from these data, and he will 
find wide variations in the results. The reason is that the measure- 
ments are taken by different investigators, on different populations, 
under different conditions, and with different instruments. The 
different batches of data contain different degrees of heterogeneity. 
Such differences in heterogeneity will result in differences in the 
correlations. This accounts in part for the common observation 
that a coefficient of correlation does not represent the proportion of 
common elements between two variables. 

In most psychological data there are two kinds of heterogeneity. 
The first results from imperfections in the instruments and the varying 
conditions under which the measurements are made. This is com- 
monly known as errors of measurement. The other results from the 
way in which the populations are selected and is known as errors of 
sampling or selection. Statistical devices for correcting coefficients 
of correlation for both kinds of heterogeneity are well known and 
widely used. But the assumptions on which these devices are built 
are not always fulfilled. Cases are constantly arising to which Spear- 
man’s formula for correcting for attenuation (chance errors of measure- 
417 
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ment) does not apply; and also to which the usual methods of 
correcting for variations in sampling or selection do not apply. 

This paper is concerned only with the second kind of heterogeneity 
—that which results from variations in population selections. The first 
kind is neglected not because the Spearman technique is deemed 
adequate for all problems but because we have nothing to propose in its 
place. 

The problem of the influence of heterogeneity due to selection or 
sampling on coefficients of correlation has been discussed by Pearson, ' 
Yule,? Brown and Thomson,’ Kelley‘ and others. Pearson has con- 
tributed several important formule# among which is his cosine formula 
for showing the influence of any degree of selection on correlation. 
Yule has contributed the partial correlation formula, and Kelley has 
contributed an excellent discussion of the whole problem of heterogene- 
ity as it occurs in the field of psychological measurements. He men- 
tions four factors of heterogeneity that ure likely to be disturbing in 
many kinds of psychological data. They are (a) maturity, (6) racial 
origin, (c) nurture, (d) sex and (e) poor selection of the tests employed. 
He is of the opinion that factors of this sort are responsible for much of 
the observed positive correlation between mental traits. 

Concerning the influence of selectional heterogeneity on correla- 
tions, the following propositions may be made: (1) That correlations 
between the same two traits computed on different data will not be 
comparable unless the data are of the same degree of heterogeneity. 
In other words, the cases must be selected in the same way in respect 
to age, sex, race, and other factors that are to be correlated with 
either of the traits in question. (2) The presence of heterogeneous 
factors will tend to increase the correlation between two variables when 
these factors are correlated with each variable and the correlations are 
of the same sign. If age is correlated positively or negatively with 
variables A and B, it will tend to increase the correlation between 
A and B. (8) The presence of heterogeneous factors will tend to 
decrease the correlation between two variables when they are correlated 
positively with one and negatively with the other, or zero with one 
and positive or negative with the other. The truth of these state- 





1 Pearson, K.: On the Influence of Natural Selection on the Variability and 
Correlating Organs. Philosophing Transactions, A, Vol. CC, 1902, pp. 1-66. 

2 Yule, U.: ‘‘An Introduction to the Theory of Statistics.” 

? Brown, William and Thomson, G. H.: “The Essentials of Mental Measure- 
ment.” 


4 Kelly, T. L.: “Crossroads in the Mind of Man.” 
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ments will be apparent by examining the conditions under which a 
partial correlation of the first order is greater or less than the correla- 
tion of the zero order. In general, we may say that heterogeneity 
tends to increase correlation when its effects on the two variables 
are similar; it tends to decrease correlation when its effects on the two 
variables are dissimilar, or when it materially affects one and not the 
other. 

From these propositions two problems arise. The first is that of 
completely eliminating all heterogeneity so that the resulting correla- 
tion will represent the intrinsic relation between the two variables. 
The second is that of securing the same degree of heterogeneity in 
all groups of data so that the resulting correlations, between the 
same two traits, will be comparable. Two methods are in common use 
for dealing with these problems. One is that of narrowing the data 
down, in each case, to strictly homogeneous groups, and making all 
computation within these groups. A sample of such a narrow homo- 
geneous group would be all twelve year old Anglo-Saxon boys, whose 
fathers are farmers. The other method is the partial correlation 
technique. Consider these two methods in order. 

The first one requires a decision on the degree of heterogeneity that 
will be permitted. Kelley! thinks that in dealing with most psy- 
chological variables the data should be homogeneous in respect to age, 
sex, and race. Home background and socio-economic status might 
also be added because they are correlated with many social and mental 
traits. One criterion might be to rule out everything that seems likely 
to be correlated with either variable in question. Another rule might 
be to permit whatever degree of heterogeneity that suits best the 
purposes of the investigation. 

The first method also requires a decision on which of the homogene- 
ous groups will be chosen for the experiment. Will it be the twelve 
year old white boys whose fathers are farmers, or the twelve year 
old white girls whose fathers are farmers, or some other grouping? 
If it is test material that is being handled and if the tests are given 
in school, then all the children of a given room will be tested. A 
great many rooms or classes must be tested to get enough cases for 
any one of the homogeneous sub-groups. If this is done, there will 
be plenty of cases in more than one sub-group. The investigator is 
then faced with the arithmetical labor involved in computing the 





1 “ Crossroads in the Mind of Man.”’ P. 27. 
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correlations for each of the numerous homogeneous sub-groups in which 
he has sufficient data. 

If this procedure is followed, and if a separate correlation is com- 
puted for each sub-group, the average of such correlations is not the 
mathematical equivalent of the partial correlation with the heterogene- 
ous factors constant. : 

The second method is the partial correlation technique. The 
formula in common use was first derived by Yule.' It is a special case 
of Pearson’s more general cosine formula which was developed later. 
Pearson’s formula gives the correlation between two traits when the 
variability of a third with which they are both associated is reduced 
to any given degree. Yule’s formula, on the other hand, always 
reduces the variability of the third trait to zero. 

The difficulties with the partial correlation technique (Yule’s 
formula) are well known to critical students of statistics. Some of 
the more obvious are: (1) It partials out too much; (2) it assumes 
a linear system of regressions; (3) it cannot be used unless the trait 
that is to be partialled out can be measured. Racial origin has been 
mentioned as one heterogeneous factor that should be kept constant 
in most psychological work. But there is no quantitative measure for 
race as such. 

A third method is proposed here that avoids the arithmetical labor 
required by the first, and the assumptions and other difficulties of the 
second. It is a more general formula than Yule’s but less general than 
Pearson’s and is mathematically akin to both. It is a simple formula 
for finding the correlation between two variables when the deviations 
are taken from the means of the homogeneous sub-groups rather 
than from the mean of the entire population. It is not new but it has 
not (to the knowledge of the writer) been used before as a substitute 
for the partial correlation formula. 

The formula is 


120102 — T mmo mF me 
Tt, = = eperreceiece 
Vat = On,Vos? — omg? 
The symbols used have the following meanings: 
12 is the correlation between the two variables when all the homo- 
geneous sub-groups are thrown into a single scatterdiagram, and the 


deviations taken from the means of the entire population in the usual 
way. 


(1) 











1 Yule, G. U.: On the Significance of Bravais’ Formule for Skew Correlation. 
Proceedings of the Royal Society, Vol. LX, 1896, pp. 477-489. 
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Tmim, i8 the correlation between the means of the sub-groups when 
each is weighted by its n. 

o, and oz are the standard deviations of the entire population. 

Tm, and om, are the standard deviations of the means of the sub- 
groups. 

To compute such a correlation the following steps are necessary: 

1. The data must be divided into homogeneous sub-groups as in 
the first method described above. The mean of each sub-group 
must be found for each variable, and the number of cases in each 
recorded. 

2. The correlation between the means of the sub-groups must be 
computed for the two variables in question, and the standard devia- 
tions of each set of means recorded. In fact, it is not necessary to find 
r and the two sigmas, but only the sum of the products of their devia- 
tions from the mean of the entire population, weighting each product 
by the number of cases in its group. 

3. Then all sub-groups are thrown into one bensbencingram and 
the correlation computed in the usual way. Ag 

The derivation of formula (1) is as follows: rn 

Let N be the total population. tail 

Let na, m, n- be the populations of the sub-groups. 

Let z be the deviation of any measure from the mean of the entire 
population. 

Let = be the deviation from the mean of a sub-group. 

Let o be the standard deviation of the entire population. 

Let s be the standard deviation of the é’s. 

Let c,, be the standard deviation of means of the sub-groups. 

Let A be the difference between the mean of a sub-group and the 
mean of the entire population. 

Let the subscripts 1 and 2 denote the two variables. 

Consider first a single variable. 


t= iE+A (2) 


Squaring, summing, and dividing by N 


x2? 3 xA? 
ese keine aa (3) 





which may be written 


o = 32+," 
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Transposing. 
s? = og? — op,” (5) 
Cn? = On? — 8? (6) 
Consider the case of two variables. 
M= i+ Ai 
te = & + A, 


Multiplying, summing, and dividing by N, remembering that the 
terms Z;,,, and Z;,,, are zero. 


2X 292 aS Best, | Bade 








| 3 (7) 
which may be written 
1120102 = Mz,28182 1H 1a, AgF mF me (8) 
Transposing 
Tetq8182 = 1120102 — 1a, AoF mF me (9) 


Substituting from (5) and remembering that 14,4, = Tmm, 
1120102 — Tm mF mF mo 


TE, 82 “hast = m2 02? — Cua" (1) 

That Yule’s formula for partial correlation is a special case of 

formula (1) may now be shown. Assume, as does the Yule formula, 

that the heterogeneous factor can be measured and that the repressions 

of variables 1 and 2 on it are linear. Designate the heterogeneous fac- 
tor as 3. 


On the assumptions of the partial correlation formula, 

















Ai = Disa 
Ao = beszs 
Multiplying, summing and dividing by N, 
DAs Ao : 01 02 G3” Ti3 7 
’ v7 = bisbe30 23”, = maori Ue Was, = 131230102 
But 
Xa 
ad TmymgF mF mg * °° «'+ TmymeF mT me = 113230102 (10) 


Also on the assumption of the partial correlation formula, 
87> = o;°(1 — 113”) (11) 
82” = o27(1 — 123”) (12) 
Substituting (10), (11)‘and (12) in (1) we have 


1120102 — 1130230102 oe T12 — Vist 23 
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Which is the partial correlation formula. 
Some of the advantages of formula (1) over the Yule formula for 
partial correlation are the following: 

1. It may be used when the heterogeneous factors cannot be meas- 
ured. Race has been mentioned as a case in point. Since race can- 
not be measured quantitatively it cannot be partialled out by the Yule 
formula, but its effects on correlation may be determined by formula (1). 

2. The Yule formula assumes that the regressions of variables 1 and 
2 on 3 (the one to be partialled out) are linear. It assumes further 
that the means of the sub-groups lie on these straight lines. The 
formula proposed here does not make either of these assumptions. 

3. This formula has the further advantage of handling any number 
of heterogeneous factors in one operation by the way in which the 
sub-groups are selected. They may be narrowed down to any degree 
of homogeneity that is desired or that the data will permit. 

4. It also enables the investigator to ‘“‘spot’’ peculiarities in the 
data and run them down. Suppose, for example, that r;2 is positive 
and Tm,m, is negative. An investigation of such an unusual occur- 
rence might reveal that one sub-group is ‘‘off’’ in one of the variables. 
Such a group could then be eliminated. 

5. The sub-groups may be made up in many different ways. They 
may be made from the population of classroom groups, boys’ gangs, 
college fraternities, cities, or states. It might aid in interpretation to 
compute r;,;, from many such groupings of the same data. 

6. The formula’ may be turned around and written in two more 
ways. 

F8 % Ty 8182 1 TmymoF mF me 
OV 812 + om? 82? + oma” 
1120102 — 1¢,¢,8182 


r = 
ae V o> 8°V/ a2? — 89” 


One value of these two expressions is that they show clearly the 
relation between the correlation of sub-group means and that of the 
individual measures. The conditions under which the correlation 
between the means of the sub-groups would be a zero or plus or minus 
one may readily be seen from formula (14). 





(13) 











(14) 
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ENVIRONMENTAL AND CHARACTER FACTORS 
INVOLVED IN SCHOLASTIC SUCCESS: 1926-1927 DATA 


FRANK K. SHUTTLEWORTH 


Sterling Fellow, Yale University 


The author has recently published a detailed report of experimental 
work during the school years 1923-1926 which centered around the 
measurement of the character and environmental factors involved in 
scholastic success. The battery of three tests described in that study 
was revised and given to all freshmen entering the University of Iowa 
in the fall of 1926. The tests were given the second week of school in 
classes in freshman speech through the courtesy of Professor Edward C. 
Mabie and the instructors in the Department of Speech. 

The revised 1926 test, under the title “University of lowa Student 
Information Blank,” consisted of three parts requiring from twenty- 
five to thirty minutes testing time. Part I consisted of sixty items in 
the form illustrated in Exhibit A. These items asked for information 


Exuisit A 


Instructions: Place a cross (x) on the line after the most correct answer to the 


following questions. Be accurate. Give complete information. Work reason- 
ably fast. 


1. What is your sex? Male...... ;female...... 

2. Your age? 16 or younger...... ‘dy liperemnts ee Seager a Oi cakes a Ss is : 
as Pe SUP, ics a ; 24 or older...... 

3. Your weight? (Check the nearest answer) 90 pounds...... - BOB. 6. ic : 
RS i Saar eo he i Saaee Ses hewn Sen bes <s eRe. «03 
so kaes SROs ensue 

Etc. 
Exuisit B 


In the first column to the left indicate the number of units you earned in each 
subject. A unit represents a year’s credit in a course meeting five times a week. 
If you did not take a particular course write “‘none”’ in this column. 


In the other columns make a cross (x) under the heading which indicates 
where you stood in each subject. 


Give complete information. Be accurate. Work reasonably fast. 





1 Shuttleworth, Frank K.: “The Measurement of the Character and Environ- 


mental Factors Involved in Scholastic Success.” Iowa Studies in Character, 
Vol. I, No. 2. 
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Subject 


Number 
of unite 


At the 


Upper 
10 
per cent 


Above 


At the 
middle 


Below 


Lower 
10 
per cent 


At the 
bottom 








Composition, rheto- 
ric, grfammar..... 
Literature, (Ameri- 
can, English 
Classic).......... 






































RARE RAPE | | 





Etc. 


Exuisitr C 


In this part of the information blank you are to indicate whether you like, 
dislike, or feel indifferent about certain things. 


Notice the lists below. You are to 
draw a cross through the D after the things you dislike very much 
draw a cross through the d after the things you dislike somewhat 
draw a cross through the ? after the things you feel indifferent about 
. draw a cross through the | after the things you like somewhat 
draw a cross through the L after the things you like very much 


You probably dislike murder very much. Notice how this work is marked in 
the first list below. You probably like very much to be happy, you probably 
neither like nor dislike pencils, undoubtedly you rather dislike to be hurt and 
rather like to be well. Notice how these things are marked in the list below. 


Mark all the words and phrases below in the same manner. Begin slowly, 
then work fast. Be accurate. Give complete information. 
murder S 34-8 canoeing Dd?itL 
be happy Dd?i cigarette Dd?itkL 
pencils Dd 1 L concerts Mm ee ee 
be hurt D ee ee abject poverty D dt i & 
be well Dd? L American first Dd?itbL 
anguish D d ? 1 LD be ambitious dad? Lt 
ankle Dd?itbL build a radio D8 F tom 
agitator Dd? i clerical work Dp. @ Ff 37m 
Beethoven Dd?itLb be discouraged D>: 43.6 
burglary et Ee be ee & do religious work Dp aT & & 
candidate ee se ae he * 

Etc. 


about high school activities; the intellectual, cultural, economic, social 
and religious status of the home; various personal and recreational 
practices; and certain study habits. With the exception of ten items, 
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all of these had shown some tendency to be diagnostic of scholastic 
success. The more diagnostic items of Part I were scored by means of 
stencils, separate stencils being prepared for men and women. Items 
of information which previous research had shown to be associated with 
scholastic success were scored plus one; items associated with scholastic 
failure were scored minus one; all other items were ignored. 

Part II asked for comparative scholastic standing in thirty commonly 
elected high school courses, for general standing during the four 
years of high school, and for general standing during the senior 
year. This was in the form illustrated in Exhibit B. Students were 
asked to indicate in the first column after each subject the number of 
units credit, and to indicate with a cross their standing relative to the 
rest of their class under the appropriate column. Values of 5, 4, 3, 2, 1, 
and 0 were assigned to these standings and multiplied by the number 
of units credit indicated. The score was the sum of the products 
divided by the number of units. A slight correction was made for the 
number in the graduating class, since it had been discovered that 
students from the smaller graduating classes tended to overstate their 
standing. 

Part III was a test of attitudes and interests in the form illustrated 
in Exhibit C. One hundred twenty stimuli words and phrases made up 
this part of the test. All had proved in the earlier study to be more or 
less diagnostic of scholastic success. Part III wasalsoscored by stencils 
separately prepared for men and women. Only the most diagnostic 
items were used. 

The scores from the three parts of the test were combined. With 
the standard deviations equal, Part I was given a weight of two, Part II 
a weight of four, and Part III a weight of one. The resulting scores 
were deposited with the university examiner. When the correlations of 
the regular entrance examinations with first semester grades were 
calculated, the predictive value of the information blank scores was also 
determined. Table I displays the correlations of the four entrance 
examimations and the information blank with first semester grades 
together with the inter-correlations of the five tests. The predictive 
powers of the five tests in terms of these zero order correlations vary 
from .446 to .622. The only test which has any marked advantage is 
the English training prediction of first semester grades of women. 

The inter-correlations of the five tests vary from .40 to .70. This 
results in some interesting changes in apparent predictive powers when 
the fourth order partial correlations are calculated. These are 
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TaBLE I.—INTER-CORRELATIONS OF THE Four ENTRANCE EXAMINATION Prr- 
CENTILES, THE INFORMATION BLANK Scores, AND First SEMESTER GRADE t 
Points For MEN (M), ror WomeEN (F), AND FoR MEN AND WOMEN 
(MF); Nomser or Men 486, NumBer oF WomEN 257; 1926-1927 











DaTa 
Variables Sex |GP1/RE2|EA3 ET4 H85) IB6 
SE OR SAS nae cede nny M |....|.446|.453].4921 .4801 .481 
F |....|.519|.526|.622|.545|.505 
MF... .|.462| 503] .569| .525) .469 
2. Reading comprehension, D2.............. M |.446)....|.623) .570)| .663) .443 


F |.519|....|.621).582|.634! .433 
\ | MF) .462|. . . .|.623) 552) 645) .446 





3. English aptitude, 1, Rev. B.............. M |.453) .623)... .|.667| .591) .528 
F | .526) 621)... .|.697| .506) .430 
MF’ . 503) .623). . . .| .699 538) .519 








4. English training, 1, Rev. B.............. M | .492).570) .667)... .|.572) .528 
F | .622|.582).697)... .|.595) .461 | 
MF’ .569) .552) .699). . . .| .519) .449 { 





5. High school content examination, B-1..... M |.480) .663).591).572)....|.511 
F |.545) .634) .506) 595)... .|.431 
MF’ . 525) .645) .538) .519)... .| .495 


ee eae a l= 








6. Information blank, 1926................. M |.481).443 528} .528 .511 
F | .505) .433) . 430; .461)| .431 | 
MF’ .469) .446 519) 449 .495 























Taste Il.—Fovurrn Orper Partiat CorReLATIONS AND MULTIPLE R’s oF THE 
Five Tests with GRADES FOR MEN, FOR WOMEN, AND FOR MEN AND WOMEN; 
1926-1927 Data (ror VaRIABLES See Tasie I) 
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Fourth order partials Men Women Men and 
women 

CNR oF 6 We 854 ROR eee ee Week tees be .077 .083 .029 
| a eg ee Gere wie ee ee > SE ae Ee oe en pn eS .016 .062 .035 
FOGMOOR, ib 6:0 60s 6 Hk eé ee BE dale we EMS Oe ee .214 . 280 . 224 
fe re Pe ee ee en eee .112 .166 . 208 . 
FM. as 5 W'S Se 6 le RE Oe bo 6 he Oe eee .261 . 248 .182 ) 
Multiple R’s | 

Ry. 23456 ad 6-4 0 6-0 6 ee le COE W 6 Oe cecee ed ee Owe .601 .695 .649 ) 
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presented in Table II. From these figures English training, the high 
school content examination, and the information blank have the 
distinct advantage in the measurement of independent factors. The 
multiple R’s are for men .601, for women .695, and for men and women 
.649. . 

Since the information blank scores measure high school standing 
in large part, it is important to note its relation to the high school con- 
tent examination, an objective test covering high school materials. 
The correlations of the information blank with the high school content 
examination are .511, .431, and .495. These correlations are lower than 
correlations of the other tests with the high school content examination. 
Student’s reports of high school standing and an objective examination 
over high school materials do not result in the same thing. Student’s 
reports have the distinct advantage in the measurement of independent 
factors. 

In the earlier study it was pointed out that one of the chief lines 
of improvement in test construction must be in the direction of provid- 
ing analytical scores. To divide a thirty minute test into a series of 
analytical scores results in considerably lowering the reliability of the 
separate scores and necessarily lowers the correlations with grade 
points. Any separate scores, however, which yield even moderate 
correlations should be of value in providing the personnel worker 
with specific hints of possible factors conditioning scholastic 
maladjustment. 

For the purpose of determining the power of the information blank 
to isolate the various factors involved, a sample of one hundred forty 
cases of men and seventy-five cases of women was selected for intensive 
study. Of these, one hundred twenty-three cases of men and sixty-nine 
cases of women finished the first semester. The scores on Parts I, IIT and 
III were already available. These were correlated against first semester 
grade points and the composite of the four entrance examinations 
keeping the men and women separate. The resulting data are displayed 
inTable ITI. Part IT is making by far the larger share of the contribution 
to the measurement of independent factors. Part III measures too 
largely factors more adequately measured by the entrance examinations. 

The intercorrelations of the three parts of the test are of interest. 
Do these three parts measure the same thing? Table IV presents for 
men and for women the intercorrelationsof PartsI,ITandIII. Whilethe 
average correlation of these three parts with grade unitsis .43, the average 
intercorrelation for men is only .25 and the average for women only .38. 
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These data demonstrate that each part of the test is measuring some- 
thing different from the other parts. 


Tasie III.—Correations or Parts I, II anp III or raz University or Iowa 
Srupent INFORMATION BLANK wITH First SemesTeR GRADES AND THE 
Composite Entrance Examinations; 123 Cases or Men, 69 Cases oF 
Women; 1926-1927 Dara 











Men Women 
Parts of test Clie Composite Oinis Composite 
points entrance points entrance 
examinations examinations 
i .841 + .05 .487 + .05 306 + .07 .315 + .07 
MI cnc winced .453 + .05 414 + .05 562 + .06 .554 + .06 
WU Mi x ca hoe ok .434 + .05 .607 + .04 456 + .06 .609 + .05 

















TaBLE IV.—INTERCORRELATIONS OF Parts I, II anp III ron Men (M) anv 
FoR WomEN (F); 123 Cases or Men, 69 Cases or WomEN; 1926-1927 














DaTa 

Variables Sex Part I Part II | Part III 
BT ee i Oe a Oe i AN .22 + .06| .32 + .05 
ap ERR Gu RR: 34 + .07 | .42 + .07 
PUPS bis ccoS babes Coir bbe ees Bet ee @ cus ccc ee .22 + .06 
F ee Be NT ea cb cha .39 + .07 

SSS ss 4 c-ecv etbncdweetiweeeas M | .82 + .05| .22 + .06 

F .42 + .07| .389 + .07 

















TaBLe V.—CoRRELATIONS OF ANALYTICAL Scores DeRIveD From Part I WITH 
First Semester GRADE PoInts AND THE ComposiITE ENTRANCE EXAMINA- 
TIONS; 123 Cases or Men; 69 Cases oF Women; 1926-1927 Data 




















Men, correlations with Women, correlations with 
Analytical scores Composite Composite 
98 entrance a entrance — 
ase examinations po examinations 
Intellectual and cultural 
status of home....... .237 + .06 _— * vr pee : so Ss = 0 
Economic status of home| .047 + .06 : ; 
Absence of formal re- o: 
ligious training....... 365 + .05| -O71 + -06 | .074 + .08 




















* Correlation not determined. 
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Again, the intercorrelations of these analytical scores is important. 
They average so close to zero that they are not presented in detail. 
The average intercorrelation is +.02 for men and —.06 for women. 

Part III of the test was divided into six separate scores. This 
section of the test calls for the subject indicating the degree of his like 
or dislike for certain revealing wordsand phrases. Thereis, accordingly, 
some uncertainty in naming the traits measured. However, the tend- 
encies involved are described in some detail. The trait names should 
be interpreted in the light of this description. The six traits are as 
follows: (1) Previous research had shown that the successful students 
reacted more positively to such items as “Beethoven,” “creative,” 
“freer discussion,” “to be open-minded,” etc. Thirteen such items 
were combined to measure intellectual and cultural interests. (2) 
Earlier studies had shown a tendency for the failing students to react 
more positively to such items as “‘church,”’ “‘be religious, ’’ ‘‘ more use of 
prayer,’ etc., while the successful students reacted more positively to 
such items as “‘spiritual’’ and “religious toleration.” Nineteen items 
of this type were combined to measure spiritual and ethical religious 
interests. (3) Eighteen items were combined to measure freedom from 
conventional interests. This measure was based on the tendency of 
the failing students to react positively to such items as “‘hired man,” 
“early to bed,” “steady pay,” “clerical work,” “‘be a follower,”’ 
“obedience,” “hard work will win,” etc. (4) Political liberalism is the 
fourth measure. It relies on the fact that the failing groups had reacted 
excessively positive to “patriotism,” “America first,” “The Star 
Spangled Banner,” etc., and then reacted negatively to “humanity is 
above any nation.”” Twelve items were scored to make this measure. 
(5) The success groups had reacted less positively than the failure 
groups to such items as “cards,” “dancing,” “canoeing,” and “‘ Pierce 
Arrow.” Eight similar items were combined for a freedom from pleas- 
ure-seeking interests score. (6) The final group of items consisted of 
mechanical interest stimuli such as “wireless,” “‘electricity,’”’ “build a 
radio,” etc. To these items the failure students had reacted more 
positively than the success students. Nine items formed this group. 
In the case of these six measures the same scoring methods were applied 
to both men and women, and all items which seemed to fall within 
any given group were used. 

Table VI presents the correlations of these scores with grade points 
for men and women. Where there is any possibility that the correla- 
tion is statistically significant, the correlations with the entrance 
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examinations were also determined. Only two of the analytical scores 
for men yield possibly significant correlations with grade points and in 
both cases the correlations with the entrance examination percentiles 
are higher. Of the six scores for women, four yield possibly significant 
correlations with grade points. Uniformly the correlations of these 
scores with the entrance examinations are not as excessively high as for 
men. 


TaBLE VI.—CorRRELATIONS OF ANALYTICAL Scores Derivep From Part III 
WITH First SEMESTER GRADE POINTS AND THE CoMPOSITE ENTRANCE EXAMI- 
NATIONS; 123 Cases or Men, 69 Cases or Women; 1926-1927 Data 


























M Correlations Correlations 
en, : Women, c 
, with with 
Analytical scores grade grade 
points entrance points entrance 
examinations examinations 

A. Intellectual and cul- 

tural interests......... .223 + .06) .502 + .04 |.250 + .08 .400 + .07 
B. Spiritual and ethical ' 

religious interests..... . .084 + .06 > .301 + .07; .390 + .07 
C. Freedom from conven- 

tional interests........ .296 + .06) .427 + .05 |.189 + .08} .160 + .08 
D. Political liberalism... .. .085 + .06 ° .351 + .07| .250 + .08 
E. Freedom from pleasure 

seeking interests....... .096 + .06 ° .088 + .08 9 
F. Non-mechanical inter- 

WR. vac xa Soha Ke .070 + 06) ° .016 + -08) ° 





* Correlation not determined. 


It seemed worth while to divide Parts I and III into separate 
scores. Three such analytical scores were made from Part I as follows: 
(1) Intellectual and cultural status of the home: Twelve items asking 
for the number of works of art;.magazines, books, periodicals, and 
musical instruments in the home, and of the parent’s education; (2) 
Economic and social status of home: Eight items asking for father’s 
income, ownership of home, automobile, etc.; (3) Absence of formal 
religious training in the home; four items asking for frequency of church 
and Sunday school attendance. The twenty-four items making up 
these three scores are slightly more than half of the items originally 
scored to make Part I but they were the only items which seemed worth 
grouping. Table V presents the resulting correlations. A favorable 
intellectual and cultural home background is slightly associated with 
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scholastic success. Economic status, as measured by the test, is not 
telated to scholastic success. Absence of church going and Sunday 
school attendance correlates fairly high with scholastic success of men 
but is not associated with scholastic success of women. The corre- 
lations of these scores with the composite entrance examinations are 
comparatively low. 

The intercorrelations of these six scores is particularly important. 
Throughout the research with this part of the test, it has been realized 
that the differential interest responses by failure and successful groups of 
students may be due to some common factor. That is, the successful 
students may be more clever in presenting a favorable picture of them- 
selves. Or one group may be consistently more sincere than the other. 
If such a common factor is operating the intercorrelations should be 
high. Table VII presents these data. The average intercorrelation 
for men is .155, and for women is .198. The correlation between 
the two scores which yield possible significant r’s with grades for men 
is —.066. For the women the average intercorrelation of the four scores 
which yield possible significant r’s with grades is .297. While these 
intercorrelations are low they may be due in part to the unreliability of 
the separate scores. 


TasB.Le VII.—INTERCORRELATIONS OF ANALYTICAL Scores FROM Part III; 123 
Cases oF Men (M), 69 Cases or Women (F); 1926-1927 Data (For 
VARIABLES Sez TasBiEe VI) 























Vaietien | Sox A | B | Cc | D | E | F 
A Beh Tce, _269* | —.038 | —.066| .057 | —.061 
Beer ae ‘213 | .035| .085| —.302| —.351 
B M 269 is9| .099| .063| .171 
F | 213 4e4| .464| .046| .097 
Cc ou | el el... 350|} .280| .328 
F ~~" a anes ‘302 | .167| 317 
D m | —.o6| .099 | .350/...... 434 |  .092 
F ‘085 | .464 | .502|...... 380| .329 
E M 057| .063 | .280| .434| ...... 206 
y |---| 00 | 197! S001 ...... 531 
F M | —.061| .171 328| .092| .206 
F | —.351| .097 | .317| .329| -.531 
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SUMMARY 


1. The composite information blank yields a prediction of first 
semester grades which compares favorably with the predictive power 
of the four entrance examinations at the University of Iowa. The 
partial correlations demonstrate that it is measuring substantial inde- 
pendent factors. 

2. Separate scores from Parts I, II, and III of the information 
blank yield substantial correlations with grade points. Parts I and II 
give comparatively low correlations with the composite entrance 
examinations, while Part III yields a comparatively high correlation 
with the composite entrance examinations. 

3. Three of the analytical scores based on Part I yield significant 
correlations with grade points. These scores correlate low with the 
entrance examinations. Their intercorrelations are essential zero. 

4. Six of twelve analytical scores based on Part III yield possibly 
significant correlations with grades. These scores correlate relatively 
high with the entrance examinations. Their intercorrelations are 
relatively low. 

5. If the information blank is to be of real service, it must provide 
the maximum number of analytical scores measuring factors other 
than intelligence which are involved in scholastic success. This study 
reports some success with eight such analytical scores. In several 
instances, however, they do not function for both men and women. 
The scores from Part III correlate highly with the entrance examinations. 
The earlier study describes techniques which with further study should 
yield the necessary improvements. 
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A NOTE ON THE CORRECTNESS OF CERTAIN 
ERROR FORMULAS! 


HARL R. DOUGLASS 


University of Minnesota 


In an article published in 1925, Holzinger and Clayton? state 
that the formula for the standard error of a coefficient of correlation 
estimated by means of the Spearman-Brown prophecy formula as given 
by Shen® is inaccurate because of the neglect of terms of higher order. 

Shen’s formula may be written: 


o,,, = m1 — r*)[N(1 + {n — 1}r‘]~ (1) 


in which r,, is the correlation to be expected between n similar forms 
of a test and n other such similar forms, r is the correlation between two 
forms, and N is the number of individuals making the test scores from 
which r is computed. 

The authors offered as a more accurate formula: 


or, = M1 — r®)[N(1 + {n — ljr)* + (n—1)2{1 +(n — Dr}? 
(1 — 1%)" (2) 


The superiority claimed for the latter formula by Holtzinger and 
Clayton is that, unlike that of Shen’s, it does not in its derivation 
neglect higher orders of an expansion involved and is therefore more 
accurate. In a later article Shen offered very convincing empirical data 
to show that the formula of Holzinger and Clayton gave values further 
from the true values than the Shen formula but did not present mathe- 
matical proof of the systematic superiority of the Shen formula.‘ The 
object of this note is to investigate the relative accuracy of the two 
formulas. 





1 The author is indebted to Dr. W. E. Milne, Professor of Mathematics, Uni- 
versity of Oregon for suggesting the use of the series indicated in (5) and its use- 
fulness in the problem under discussion. 

2 Holzinger, Karl J. and Clayton, Blythe: Further Experiments in the Applica- 
tion of the Spearman-Brown Prophecy Formula. Journal of Educational Psy- 
chology, May, 1925, Vol. XVI, pp. 289-299. 

’Shen, Eugene: The Standard Error of Certain Estimated Coefficients of 
Correlation. Journal of Educational Psychology, October, 1924, Vol. XV, pp. 
462-464. 

Shen, Eugene: A Note on the Standard Error of the Spearman-Brown For- 
mula. Journal of Educational Psychology, February, 1926, Vol. XVII, pp. 93-94. 

434 





























Correctness of Certain Error Formulas 435 


First it should be noted that (2) will of necessity give a smaller 
value than (1), since the term with the negative exponent is, for all 
values of N, n and r (positive or negative, greater or less than unity) 
greater in (2) than in (1). 

The variation or error in r,,, resulting from a change or error of 
6 in r is: 

n(r + 6)[1 + (n — 1)(r + 8)f-'— n7{l + (vn — 1)r} 

By definition 

= Y[n(r + 6){1 + (n — 1)(r + 8)}-' — nr 

{1+ (n—1)r}"PN- (3) 
in which 6 denotes a variation in r and the summation is performed 
for all N variations. By algebraic reduction we obtain 


a7, = ml + (n — 1)r}-*N“ 281 + kdl? (4) 
in which for the sake of brevity we have set 
k= (n—1) (1+ (n— I)rf' 


We may assume that under circumstances where the Spearman- 
Brown prophecy formula can be legitimately used, the variations in r do 
not exceed the value of r itself. Therefore 
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for n <1. Dividing by r+ Si we get—1 < ki < 1 as we see 


(n — 1) 
by the definition of k. 
It is therefore legitimate! to expand the factor (1 + ké)-? in (4) 
into a series of powers of ké, 7.e., 


o%, = n?N-1[1 + (n — 1)r]-*{ D8? — 2kDs* + Bk2DS* +--+ | (5) 


If in (5) we neglect sums of higher powers of 5’, replace 2 6* by its 
equivalent (1 — r?)?and take square roots, we obtain (1), Shen’s formula. 
Since we may safely assume that the 6? are symmetrically distributed 
about the zero value or approximately so distributed, the sums of odd 
powers will be zero or quite small in comparison with the sums of even 
powers. Therefore, while it does not seem possible practically to 





1The means of showing how the expansion as indicated above may be accom- 
plished are not limited to the simple one given above. 
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determine the summation of the series in (5), nevertheless, since the 
remaining terms, the even powers, are all positive, the value of o,, 


obtained from (5) will be greater than that obtained by (1). But, we 
have already noted that the formula proposed by Holzinger and Clayton gives 
a smaller valuation than that given by (1). Therefore (2) will give values 
less accurate than the approximations yielded by (1). 

One is lead to attempt the derivation of the Holzinger-Clayton 
formula. Theauthorsstate merely that it “is readily obtained by setting 
up the difference function f(rzy — A,,,) — f(rzy) in which f(rzy) is Tan 
squaring, summing for all samples, and making the substitution 2A’,, = 


— 2 2» 
a _ ev)", If one bears in mind that r,, denotes the r of the present 





articles, and A,_, our 4, he will see that the procedure suggested above 


has been followed in our derivation. It is interesting to note that if 
one writes (3) as a fraction and employs the inadmissable procedure of 
summing with respects to 6 independently in the numerator and denomi- 
nator, he will obtain (2), the formula of Holzinger and Clayton. 

If we take as an approximation too,,_, the error in r,, resulting from 
an error in r equivalent to o,, which must be recognized as a very close 
approximation varying from complete accuracy only because of lack of 
complete symmetry,' one may arrive at a formula which apparently for 
most practical situations at least yields values closer to (5) than either 
the formula by Shen or that of Holzinger and Clayton. If we let 
o,,,, the standard error of 7,, resulting from error in r equivalent to o, 


Tan + Or, = n(r + o,)[1 + (n — 1)(r +¢,)}" 
n(r ie o,){1 + (n rs 1)(r ex o,)}-} 


Tan Oran 
Subtracting, 
2c,,, = nr +,)[1-+ (n — 1)(r +,)}-! — n(r — o;) 
[1 + (n — 1)(r — o,)} 


1 It should be remembered that +6, as so taken represents a range of error in 


Tan Within which, because of properties of ¢, upon which it is based, the error in fas, 
will lie 68.26 chances out of 100. Or more simply stated, if we adopt the very 
reasonable assumption that errors in r are chance errors of sampling and are there- 
fore normally distributed, an error in any r greater than o, will occur but 31.74 
times in 100. Therefore that which we consider here as equivalent to 5, is the 


error in fp, greater than which the error in r,,, will not be 31.74 timesin 100. This 
is illustrative of how closely that which we have taken as equivalent to 6, | actually 
approximates it 
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which when simplified gives us 
Oran = (1 — r*)[N*(1 + r{n—1})*- — (1—r?)N4(n—1)4]-*_— (6) 


It should be noted that this formula, like (5), gives values greater 
than those yielded by (1) and (2). Under what conditions, if any, its 
value could be greater than that obtained by (5) by an amount exceeding 
the discrepancy between (1) and (5) or (2) and (5), and therefore 
consequently not be a closer approximation to the true standard error 
than (1) or (2) the author is not able to ascertain. 

An inspection of (6) will reveal the fact that only when n is large and 
r small will its value differ greatly from that for (2), and that since, as 
may be seen from (4), when n becomes large the value of (4) and (5) 
will also differ by increasing amounts from the values of (1) and (2). 

At any rate, it should be clear that the Holzinger-Clayton formula 
will not yield values as nearly accurate as the Shen formula and that 
formula (6) of this article will furnish values which, in at least most 
situations to which the Spearman-Brown formula would be applied, 
give a still closer approximation than either of the other two. 
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THE LAW OF USE 


NOEL B. CUFF 


Eastern Kentucky State Teachers College 
1. INTRODUCTION 


An historical survey shows that ‘‘experiment in psychology is at 
least as old as Aristotle.’’!? There is a vagueness and indefiniteness 
about the methods of the nineteenth century as well as those preceding 
it. The work was subjective and, therefore, is open to criticism, to 
doubt, and to attack. For almost fifty years psychologists have 
been engaged in experimental attack on problems dealing with the 
phenomena of learning. As is to be expected, many of the complex 
and difficult problems are by no means completely settled. 

Ebbinghaus’ “Ueber das Gedachtniss,” published in 1885, con- 
tains the conclusion that each repetition after learning has practically 
the same effect. This conclusion has permeated our psychological 
and educational literature. Pedagogical formulae continue to empha- 
size drill, exercise, frequency, and repetition!-?:!5.16.'7.19.22 after a 
number of experimentalists have reached the conclusion that repetition 
is not a selective factor’®!4?!, in learning and that the fixing 
value*:+.5.%.11 of overlearning is questionable. 

Thorndike’s statement of the ‘‘Law of Use” is well known. Pro- 
fessor Woodworth also writes as follows: “Of one law of learning, we 
are perfectly sure. There is no doubt that the exercise of a reaction 
strengthens it, makes it more precise and more smooth-running, and 
gives it an advantage over alternative reactions which have not been 
exercised ... ” 

Quotations will not be multiplied since the law of use is commonly 
accepted by writers. However, it obviously has a number of limita- 
tions. Peterson'* and Kuo!® have shown that frequency will not 
establish a correct response. Thorndike,?! too, has reached the con- 
clusion recently that frequency fails to account for the elimination of 
wrong responses; however, he may still hold that it is applicable after 
the maladaptive responses have been shunted off.2° Dunlap,> Hub- 
bert,® and the writer‘ have questioned the fixing value of frequency. 
Cason refers to the “‘improbability of the physical mechanism postu- 
lated for the law of use. The law of use does not hold in biology 


generally, and it may be readily seen that the physical sciences also 
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afford no data which illustrate it.” Lashley"! speaks of the doctrine 
which holds that use lessens synaptic resistance as an ‘‘ unsubstantiated 
belief’’ which is not based on neurological data. 

The prominence that Ebbinghaus’ conclusion has attained, which 
many writers think is not entirely justified, makes it important that 
the relation of overlearning to retentign be carefully studied to dis- 
cover if there is an approximate proportionality between the number 
of repetitions of a series and the saving of the work in relearning made 
possible thereby. 


2. THe PROBLEM 


The survey of previous studies not only emphasizes the importance 
of repetition in learning, but also indicates that other studies should 
be made to discover the full value of the law of use. 

The present study is an investigation of the effect of overlearning 
on retention. We have undertaken to answer the following questions: 

1. Shall we expect that a percentage of the readings for overlearning 
will be saved in the relearning twenty-four hours afterward? 

2. Is there a definite ratio between the number of readings and 
this saving? 

3. Is there a relationship between the per cent saved and scores 
made on standardized educational tests? 

4. Is there a difference in per cent saved, of an overlearned series, 
by an individual of a relatively high mental level and one of his fellows 
of a lower mental level? 3 

5. Is there a sex difference shown in the per cent saved by girls 
and boys of the total readings? 

6. Is there a relationship between the per cent saved and the scores 
made on an emotion test? 

7. Is repetition ever detrimental? 


3. EXPERIMENTAL MATERIAL AND PROCEDURE 


The seventy-five subjects of this experiment were students in 
general psychology at David Lipscomb College during the winter and 
spring quarters of 1927-1928. They ranged in age from sixteen to 
thirty-three years. The median age for girls was eighteen and for 
boys nineteen years. The Army Alpha scores ranged from one 
hundred eighty-eight to seventy-five, with a median of one hundred 
twenty-four; while the Otis scores ranged from sixty-seven to twenty- 
one with a median of forty-eight. The classification of the students 
was as follows: 
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Sex Freshmen Sophomores 
re iy are ee ee a 27 21 
MGS Ly 5 coe 04K be Ww ews cee Oke hO es 16 1l 











These students were drawn largely from good homes. They had 
had favorable economic, social, and educational advantages. 

Since it is important that the conditions remain as nearly uniform 
as possible for different individuals and for the same individual at 
different times, we selected consonants, digits, and nonsense syllables 
for our tests. The tests were the same as those used in an earlier 
study and rules governing the selection and arrangement of the con- 
sonants, digits, and nonsense syllables may be found in the “‘ Relation 
of Overlearning to Retention,’’ George Peabody College for Teachers, 
Contributions te Education No. 43. 

Jastrow’s memory apparatus was used to expose the successive 
stimuli. The writer gave the tests to groups of fifteen subjects, the 
groups being approximately equal in ability as determined by the com- 
bined rank of the intelligence tests. In order that the myopic sub- 
jects, if there were any, might plainly see the tests, all subjects were 
allowed to select their seats. Hence, the tests were plainly visible 
to all. The tests were given at the same period each day—from two 
to three P.M. The test given on the previous day was relearned and, 
after a rest of five minutes, a new test unlike the one relearned, was 
given. The writer was very careful to have the exposures regular 
and in time with a metronome which was set at forty beats per minute. 
Two preliminary tests were given to familiarize the subjects with the 
directions and with the apparatus. The subjects apparently had and 
retained the proper attitude toward the experiments. While the tests 
are relatively senseless and homogeneous, as an additional precaution 
different groups took different forms of a test for a degree of over- 
learning. By this procedure any differences found for the degree of 
overlearning could not be interpreted as due either to unequal diffi- 
culty of the tests or to unequal abilities of the subjects. The instruc- 
tions given to the subjects were: 

“This test consists of twelve consonants (12 digits or ten nonsense 
syllables). Two will be shown in the window at the same time, a new 
one being exposed every 1144 seconds. You are to learn them so that 
you can repeat them without error. As soon as you feel quite certain 
that you can repeat them correctly, an assistant will give you an 
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opportunity to do so. After you say them aloud, you will be given 
four (sixteen or twenty-eight) additional exposures. You should be 
positive that you can recite the test before trying to do so, but you 
should recite it as soon as possible, since a record is to be kept of the 
exposures required for learning. After twenty-four hours, you will 
be shown again the set of consonants (digits or nonsense syllables), 
and a record kept of the number of presentations required for you to 
relearn the set. It is obvious that you must not practice on the test 
between learning and relearning. To practice will completely destroy 
the value of the results.’’ 

We gave according to their respective directions eight standardized 
tests from which data were secured. 


4. RESULTS 


It was deemed best not to reproduce the original data. We have 
this material however on file. 


TaB_e I.—QvaARrTILE DistTRIBUTION SHOWING THE RANGE AND VARIABILITY OF THE 
NuMBER OF REQUIRED READINGS FOR RELEARNING AFTER TWENTY-FOUR 














Hours 
Consonants Digits Nonsense syllables 
Percentile Te ee ae they | Times rbaanny sme they Tie hing » tA they 
4 | 16 | 2 4 | 16 | 28 4 | 16 | 2 
75 7.3) 6.5) 4.9] 6.1] 5.8) 4.5] 15.1] 16.5} 12.5 
50 8.5; 8.1) 6.3) 7.5] 7.3) 5.6] 20.8; 23.0; 16.7 
25 10.5| 10.3; 8.2) 8.5} 8.9) 6.8] 28.3] 26.7] 23.7 
































A low score indicates a good performance. 


Table I shows that the tests were relatively simple (the medians 
ranging from 5.6 readings for a digit test to 23.0 for a nonsense syllable 
test) and relatively homogeneous (e.g., the medians of the digit tests 
are approximately the same). By comparing the medians it is evident 
that the digit tests were the easiest and the nonsense syllable tests were 
the hardest, while the consonant tests were only slightly harder than 
the digit tests. The medians for the consonant, digit, and nonsense- 
syllable tests which were read four times after they were learned 
are 8.5, 7.5, and 20.8, respectively. 

The results in Table II are the percentages that the readings 
for relearning were of the readings for learning after twenty- 
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four hours. Hence, one can find the per cent that the readings saved 
were of the readings for learning by subtracting the quartiles from one 
hundred. If one subtracts the medians for the consonant tests, 20.1, 
14.5, and 17.5, from one hundred, it is obvious that readings after 
learning may be useful. A comparison of the medians for sixteen and 
twenty-eight additional readings of the consonants shows that the 


TasLe II.—QvartTILE DistRIBUTION SHOWING THE RANGE AND VARIABILITY OF 
THE PERCENTAGES THAT THE READINGS FOR RELEARNING AFTER TWENTY- 
FouR Hours WERE OF THE READINGS FOR LEARNING 
































Consonants Digits Nonsense syllables 
Pavone | Sengand shee shew | Teme sealistae tear. | Fie cend ee ter 
4 | 16 | 28 ee ae a 23 | 4 | 16 | 28 
75 weil 0.9; 7.5] 0. | 11.3; 0.8); 11.5; 5.5] 13.5 
50 20.1 14.5} 17.5} 13. | 20.4; 20.3) 20.6; 15.8) 24.5 
25 25.9 25.7| 29.5| 25.6) 33.1| 338.5) 33.5| 27.5) 35.5 
{ i 











last twelve readings of the twenty-eight were worse than useless. 
The medians and quartiles for the digit and nonsense-syllable tests 
also show that readings after learning maybe (1) useful, (2) useless, 
or (3) worse than useless. The theoretical ideal of drill resulting in 


Tas_e II].—QvuartTiLeE DistrRisuTION SHOWING THE RANGE AND VARIABILITY OF 
THE PERCENTAGES THAT THE READINGS SAVED WERE OF THE TOTAL READINGS 
AFTER TWENTY-FOUR Hours 


























Consonants Digits | Nonsense syllables 

Percentile _| Times read after they | Times read after they | Times rad after, they 

4 | 16 | 28 4 | 16 | 28 4-| 16 | 28 

| 
75 64.3) 33.7) 18.8 64.1) 28.8) 15.7 73.3) 56.5) 35.2 
50 55.8) 27.8) 14.8) 50.8) 22.8 12.7 | 67.5) 47.8) 26.5 
25 50.1) 22.5! 9.5} 44.5: 17.9) 9.3! 50.8) 38.3) 22.3 
| 























accuracy and skill must be modified in view of the facts that there 
are diminishing returns from drill and that the value of drill may be 
negative. 

The quartile scores in Table III show, contrary to Ebbinghaus’® 
conclusion, that each repetition after learning does not have practically 
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the same effect; e.g., note that as the readings increase for the tests 
the medians decrease showing that later readings have less value than 
earlier readings. The percentiles show without exception that the 
greater the number of readings the lower are the percentages that 
the readings saved are of the total readings. Ebbinghaus did not 
claim that the saving of one repetition out of three continued in that 
ratio when the number of presentations given in learning was increased 
beyond sixty-four. Our results indicate, however, that there is not 
an approximate proportionality between the number of readings of a 
series and the saving in work made possible thereby, as one reading 
saved out of three readings after an intermission of twenty-four hours, 
when the number of presentations is less than sixty-four. 

The factors correlated in Tables IV and V are distinguished by 
numbers, as follows: 

1. Consonant test which was read four times after it was learned. 

2. Consonant test which was read sixteen times after it was learned. 

3. Consonant test which was read twenty-eight times after it 
was learned. 

4. Digit test which was read four times after it was learned. 

5. Digit test which was read sixteen times after it was learned. 

6. Digit test which was read twenty-eight times after it was 
learned. 

7. Nonsense-syllable test which was read four times after it was 
learned. : 

8. Nonsense-syllable test which was read sixteen times after it 
was learned. 

9. Nonsense-syllable test which was read twenty-eight times after 
it was learned. 

The coefficients of correlation were found by the rank difference 
method.'* The probable errors range from .08 to .06. It is 
commonly held that a correlation should be four times its respective 
probable error. With seventy-five subjects, the coefficients should 
be as high as .28 for the correlation to be markedly present. Hence, 
all that can be said of many of the correlations is that “correlation 
is present but low.’’ Some of the correlations, however, are obviously 
reliable. The correlations in Table IV by being negative led us to the 
conclusion that the individuals who gain most from the additional 
readings are the ones who require the most readings to learn the series, 
and vice versa. The results for Table V show that the subjects with 
the highest affectivity and idiosyncrasy scores on the Pressey X-O 
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Tests for investigating the emotions are the ones with the lowest 
percentages that the readings for relearning were of the readings for 
learning. Negative correlations, which space does not permit us to 
include, indicate that an individual who has a high score on the per- 
Taste IV.—Cosrrricients OF CORRELATION BETWEEN! THE PERCENTAGES THAT 


THE READINGS FOR RELEARNING WERE OF THE READINGS FOR LEARNING AND? 
THE READINGS FOR LEARNING 





a oe es oo ah I 2s a ons — .26 
SEDER SOR it ® Wear engi a osc. cas cha Boks nae — .30 
ERR ae eee SN SE RC RAMEE Nae neat na 
Re, ee METI sia heh ook ce uae hobs — .08 
EEE RSet pio go ROR Poon ee = 38 











Taste V.—CoEFFICIENTS OF CORRELATION BETWEEN THE PERCENTAGES THAT 
THE READINGS FOR RELEARNING WERE OF THE READINGS FOR LEARNING 
AND THE PressEY X-O TEsTs 





1 2 3 4 5 6 7 8 9 











Affectivity............. — .01; — .14| — .06| — .16| — . 18) — .08) — .26; — .08; .04 
Idiosyncrasy............ .08) — .08) — .01) — .04; — .06| — .08) — .02);— ..14) .08 


























Taste VI.—Mezans, PropaBLe Errors or THE M&ANS, DIFFERENCES BETWEEN 
THE MEANS, PROBABLE ERRORS OF THE DIFFERENCE, AND CHANCES IN ONE 
HUNDRED THAT THE TRUE DIFFERENCE 18 GREATER THAN ZERO OF THE 
EFFECT OF AN INCREASE OF REPETITIONS ON THE HIGHEST AND 
LOWEST QUINTILE OF SEVENTY-FIVE CasES 















































Readings after learning 4 16 28 
Mean PEay ‘ Mean PEav ° Mean PEayv a 
Quintile 
H L Hi L H L Hi L H L Hi L 
Per cent readings saved are of 
the total readings, conso- 
ee iors owed 57 .4|51.86/1.99|2,58/23.33\27.4 |1.10)1.99)13.86)13.6 |1.02)1.32 
BP EE vc cop cwcsccsss §.54|3.3 |.. 4.07|..... 5 pH neds ae .26)1.7 
Chances in one hundred of a 
true difference............. .. .|87 ee APE ae 89 Re SR ere 54 
SI isha Was 6 be bed 00.82 47.0\49.27|2.5 |2.7 |21.5 |21.62/1.31/1.46)14.8 |12.66/0.7 |1.11 
NEEL ID LOPS soeek See Lbs ¢obesnen Se Is odabecsnt .86)1.3 
Chances in 100.............. -naalcsechnts baud 67 ee PES 
Nonsense syllables........... 59.6|59.87'2.95\3.57\41.86/47.53|1.82/2.4 |25.8 |30.2 |1.03)1.26 
as oo ie vevecess ote al.” ane Es siveloeoun FR See eee 4.4 |1.6 
Chances in 100.............. jaan sae Wash pee" 67 Pa PE 
I 
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TaBLeE VII.—MeEans AND PRoBABLE Errors OF THE MEANS OF THE EFFECT OF AN 
INCREASE OF REPETITIONS 





Readings after learning 





4 16 28 





Mean | PE,yy. | Mean | PEsy. | Mean | PE,y. 





Per cent readings saved are of 
the total readings after 
twenty-four hours, conso- 




















I inns ck bso OS ic oe 27.4 57 | 54.61 .92 | 12.74 .56 
eS ks 4 oben sé 420k a he 50.96 | 1.06 | 22.77 .56 | 12.49 .39 
Nonsense syllables........... 61.43 | 1.31 | 36.43 .98 | 28.04 .76 
PEas, of means............. 4 and 16 4 and 28 16 and 28 
| ee eee eae 1.08 .79 1.08 
OR a aes hae 1.17 1.10 68 
Nonsense syllables........... 1.64 1.52 1.24 
Difference between the means, 

RRR Nast rapaeay: 24.21 14.66 41.87 
MNEs Kas Os Vice eck enews 28.19 38 .47 10.28 
Nonsense syllables........... 25 .00 33 .39 - §.39 














centages that the readings saved were of the total readings is likely 
to have a low score on standardized educational tests. The subjects 
with the lowest affectivity and idiosyncrasy scores tend to save 
the largest percentages of the total readings. 

Table VI shows that the percentages the readings saved are of 
the total readings are greater for the lowest quintile seven times 
out of nine. The two exceptions are the consonant tests read four and 
twenty-eight times after they were learned; the chances being eighty- 
seven and fifty-four in one hundred respectively that further testing 
would continue to show a difference between the means. The differ- 
ence between the means of the tests read twenty-eight times after 
learning is not four times the probable error of the difference, 7.e., 
.26 is not four times'1.4, and is not reliable. This shows that the 
pupils in the lowest quintile, who are the dullest cases, profit more 
from additional readings than do those in the highest quintile, who 
are the brightest ones. 
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Table VII confirms the former tables. Unpublished data show 
that the differences between boys and girls are not large enough to 
have practical significance. The table in addition to giving means 
gives the difference between the means and the probable errors of 
the differences. The smallest difference between the means is 8.39 
with a probable error of 1.24. Since a quotient of four when D is 
divided by PE,,, indicates complete reliability, it follows that our 
obtained difference of 8.39 is not only completely reliable, but is 
sixty-nine per cent larger than it need be in order to insure a true 
difference greater than zero. 


5. SUMMARY AND CONCLUSIONS 


An effort has been made in our study to test objectively conclusions 
relative to the law of use based upon Ebbinghaus’ statement that 
each repetition after learning has practically the same effect. Seventy- 
five subjects, unselected with the exception that they were students 
in general psychology classes, took eighteen tests each; making a total 
of one thousand three hundred fifty. 

The following conclusions are drawn from the statistical findings: 

1. There is not a definite ratio between the number of readings 
and the saving of work made possible thereby as one reading saved 
out of three readings, or 3314 per cent, after an intermission of twenty- 
four hours. Table VII shows that when the digit tests are read four 
times after they are learned the saving is 50.96, when read sixteen 
additional times it is 22.77, and for twenty-eight additional readings 
it is 12.49. 

2. The percentages saved may increase, may remain the same, or 
may decrease as the number of readings is increased after a series 
is learned. In other words, additional practice after a series is learned 
may be useful, useless, or worse than useless. 

3. The sex differences revealed by these tests are practically 
negligible. 

4. The brighter subjects who are in the highest quintile profit 
less from additional readings than do the duller ones. There is only 
one reliable exception to this statement shown in Table VI. 

5. An individual who has a high score on the percentages that the 
readings saved were of the total readings is likely to have a low score 
on a standardized educational test. 

6. The subjects with the lowest affectivity and idiosyncrasy scores 
tend to save the largest percentages of the total readings. 
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7. These conclusions agree, to some extent, with those cited in this 
study by Hubbert, Cason, Dunlap, Lashley, and the writer, and indi- 
cate that the doctrine holding that the frequent passage of a neural 
impulse somehow leads to the fixation of a habit is an unsubstantiated 
belief. 
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THE RELIABILITY OF THE GOODENOUGH DRAWING 
TEST WITH FEEBLE-MINDED SUBJECTS 


LLOYD N. YEPSEN 


The Training School at Vineland, N. J. 


Upon including a scale, purporting to measure the ability of an 
individual, in a clinical syllabus the psychologist is concerned with 
the degree of reliability of the test for repeated measures. Certain of 
the more widely used tests have been subjected to this inquiry. Lin- 
coln! reports the reliability of .95 between tests given in the morning 
and afternoon. The majority of the group tests in use have been 
studied by the authors according to different methods and are reported 
in the several manuals of instruction. The recent introduction of a 
newly standardized method for measuring intelligence by drawings, as 
presented by Goodenough,” has given the clinician a new means of 
investigating what the author calls intelligence. The test, because 
of its evident usefulness and the fact that it investigates a hitherto 
unplumbed field, was undoubtedly included in the examining technique 
of many clinics throughout the country. Having used it in our 
own clinical work for about nine months the questions arose as to its 
reliability in second or third administration. Certain scales are note- 
worthily useful for the initial examination but questionable for second 
administration. Porteus,* for example, does not recommend the 
use of his maze a second time. Some consider that this impairs the 
value of his scale while others do not. 

Our object in carrying out this experiment we report here was to 
determine the reliability of the Goodenough drawing test for immediate 
re-administration a second or third time using feeble-minded subjects. 
Goodenough‘ reports a correlation of .937 + .006 between scores carried 
on two succeeding days on one hundred and ninety-four first grade 
children and an average correlation (split scale) of .77 on children 
between the ages of five to ten. 


1 Lincoln, E. A.: The Reliability of the Stanford-Binet Scale and the Constancy 
of Intelligence Quotients. Journal of Educational Psychology, Vol. XVIII, No. 9, 
p. 621. 

2 Goodenough, Florence L.: ‘‘Measurement of Intelligence by Drawings.” 
World Book Co., 1926. 

’ Porteus, S. D.: Guide to Porteus Maze Test. Publications of The Training 
School at Vineland, N. J., Department of Research, No. 25. 

4 Op. cit. 
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The subjects in our experiment were thirty-seven feeble-minded 
boys between the ages of 9.0 and 18.2. All were in two classes in the 
school department attending school all day. Recent clinical studies 
were available on all the cases including such additional test data as are 
used in this article. Three cases were discarded from an original 
number of forty cases, because of a paralytic condition or observed lack 
of cooperation. 

The experiment’? was carried out over a two weeks period, being 
administered three times with four days intervening between each 
trial. Each child was given a blank sheet of paper and provided with a 
pencil. The instructions were those given by Goodenough: “On these 
papers I want you to make a picture of a man. Make the very best 
picture that you can. Take your time and work carefully—try very 
hard and see what good pictures you can make.” The same technique 
was followed during the second and third trials. The papers were then 
scored and re-scored to eliminate initial errors in the scoring. . 


TaBLe I.—Snowine Rance or Test Ace Scores 








Test age First Second Third 
administration administration administration 

13.5+ és as 1 
13.0 

12.5 

12.0 

11.5 

11.0 1 1 

10.5 1 2 
10.0 1 
9.5 i 1 1 
9.0 1 2 
8.5 2 1 2 
8.0 2 1 1 
7.5 3 3 2 
7.0 : or 2 5 
6.5 12 12 8 
6.0 is 3 5 
5.5 6 6 6 
5.0 1 
4.5 4 3 4 
4.0 
3.5 i 1 














1 The writer is indebted to Miss Phillis McMurray of Wellington, New Zealand 
for the gathering of the material and scoring of same. ; 
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The test age results for the three administrations are shown in 
Table I. 


Table II presents a summary of the data on the thirty-seven cases 
used in the experiment. 


TasBLe II].—Summary or Data (in YEARS) 

















Range | Median SD 
MS St oe ia ss cath es | 9.0-18.11 13.7 
Stanford Binet test age.............. | 4.8-11.2 7.0. 
Stanford Binet IQ...................| 39-90 59 
Goodenough | 
Trial I, test age...................| 3.6-18.54+ 6.5 1.76 
Wish TE Getaes 6.6... ce. | 3.5-13.5+ 6.5 1.46 
Trial III, test age.................| 4.5-18.5+ 6.5 1.66 
i NG a Seo ste e's | 2-42 14 
pg ee ae | 2-44 14 
Trial III, raw score................| 5-43 13 





TasBiLe III.—Amount or CHANGE FROM First TO SECOND ADMINISTRATION 


—1.5 —1.0 — 5 0 +5 +1.0 +1.5 +2.0 
2 5 2 21 4 2 0 1 


Median decrease .75 year. Median increase .5 year. 


TaBLE IV.—Amount or CHANGE FROM SECOND TO THIRD ADMINISTRATION 


—1.5 —1.0 — 6 0 +.5 +1.0 
1 2 5 20 6 3 


Median decrease .5 year. Median increase .5 year. 


TaBLE V.—AMOUNT OF CHANGE FROM First To THIRD ADMINISTRATION 


—1.5 —~1.0 —5 0 +.5 +1.0 
2 3 3 19 8 2 


Median decrease .75 year. Median increase .5 year. 

The above analyses of individual variations as shown in Tables 
III, IV and V indicate the close agreement between the results obtained 
on three administrations. Additional evidence of the agreement 
existing between the successive scores is indicated by the coefficients of 
correlation. 
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Reliability of Goodenough Drawing Test 


TABLE VI.—CorreLaTions 


Binet test age and Goodenough test age..................... .60 
Goodenough test age, first administration.................... 
Goodenough test age, second administration................. .89 
Goodenough test age, second administration................. 
Goodenough test age, third administration................... .91 
Goodenough test age, first administration... .. Ee sed Ree eae 
Goodenough test age, third administration................... 91 


The conclusions contingent upon the above data obtained on thirty- 
seven feeble-minded subjects are: 

The Goodenough tests can be successfully applied with feeble-minded 
subjects after the original administration with a high degree of 
reliability. 

Approximately fifty per cent of the cases will remain the same; 
twenty-five per cent will increase; twenty-five per cent will decrease. 

The variability will rarely exceed 1.0 years. 

The test itself appears to measure something not entirely covered by 
the Binet. 

It is desirable that additional experiments be carried out in order to 
determine the reliability of increments of growth over definite periods. 





AN EXPERIMENT IN VERBAL LEARNING 
GEORGE D. STODDARD 


University of Iowa 


This study reports some findings concerning the learning of French 
equivalents. There is involved, first of all, a practical problem. An 
analysis of a French grammar commonly used in high schools and col- 
leges over the country shows that it contains 2127 lines of French read- 
ing material to be translated into English and 2878 lines of English to be 
translated into French. Similarly the French-to-English vocabulary 
consists of 1400 words and the English-to-French vocabulary of 2100 
words. Since the difficulty (and therefore the time consumed) in 
passing from English to French is certainly much greater than in the 
case of French to English, a question of some pedagogical significance 
is: What is the advantage, if any, of translating English material into 
French? Does learning in this order show a heavy transfer to the 
reverse order, the latter being admittedly the more useful? The present 
report will throw light on this situation so far as vocabulary alone is 
concerned. 

There is, secondly, a psychological problem in which the use of 
French equivalents is simply illustrative. It is of interest to know, for 
various types of material: 

1. The ease of forming word-associations involving one familiar 
and one unfamiliar term as measured by the number learned in a given 
period. 

2. The relative effectiveness of recall in the practice and reverse 
practice orders. 

3. The relative ease of the practice orders familiar-to-unfamiliar 
and unfamiliar-to-familiar. 

4. The combined weight of practice order and familiarity (or 
unfamiliarity) in recall. 

5. The extent of individual differences in these associations. 

The materials selected consisted of fifty words from Ward’s ‘“‘Mini- 
mum French Vocabulary Test Book.”’! All articles were omitted, but 
accents were properly inserted and considered a part of the word. Com- 
mon English translations were selected. The list of fifty paired words 
was then prepared as two forms: In Form A (see Table I) the French 





1 Ward, C. F.: ‘‘Minimum French Vocabulary Test Book.” The Macmillan 
Company, New York, 1926. Pp. 101. 
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TaBLe I.—VisvaL VocaBuLARY—Form A 


SS NS bcs wie bd & Oko wy buy Mie Cass KUhie ba be run 
IS 6 s'4'o 4 sy 6c'0S Seva bs love Ay RRR a pigetine erep se cover 
Naoko ohh ae ob a eed ee go SE Perey area? tie 
Cee. So. 6 2 cae wees England 29. croftre.................. grow 
Ds «x sv iene caw a learn 30. débarquer............... land 
pS vo'bs ios Soo keeps stop Ir ate ap rer outside 
ERIE NP papper nate ps7 UE Os SIA co cov save ode cenwa departure 
i IIS 5c ac's oc bees w oat above ee NS oo ccndeumess bbe desire 
i 62s. snes Cab oe Oa other Ms os sci kc cna Sus owe 
i” Saree UW I iid os 0 0 v'nw chs card say 
SERED co's 0s 00 ob acne een low MS os cocina soot sian divide 
SI kc kw 0s 08 8 eae baby I Sn oc has ccce cows give 
SUG Ss 56 da 6 04 02a cou corn SES 8s a ivc's so bce wwe sweet 
MNCS 60 5 <9 8 oh baad box ss are v.43 v bus oka water 
Rs 55.5 4 od 5:d bur enon een brown la Sa Nov i sivie woe une equal 
I ne vi pies cha ada break SR ne loan 
IDK 5 i 0074 403.4 abte Aree center Says bb oc wee ee next 
5 se sa etee heat CGS so cnsctcscactenn send 
SS” Ee epee Ee re warm CN cnn ckyeesnaws hope 
ai vn oo bh a ea eae dog as cise ix 4m 0ce's bla oes state 
Re a hill ee it i ae ae be 

Me Ms 0S o's'e'n's uw ko gas suit Se ME, wick cabo vawigue express 
De ci. sc «cesses know Is ona 6 bind o6 ¥ clas make 
BRAC Tre peenmem «GB, GemGeee. ... 5 ooo. ce eccees window 
er NS i 50s caniwisaad are sew ke Sg ond or eae ah ak son 


word was given at the left and the English word at the right and the oral 
directions indicated explicitly that the learning was to proceed from 
left to right; in Form B (see Table II) the English word appeared at the 
left and the French word at the right and the association again was 
formed from left to right. In both cases a notched card was provided 
to cover up the right hand term as learning went on, and the method of 
learning was supervised by teachers. 

The subjects consisted of three hundred twenty-eight pupils in 
Grades VII, [X, X, XI, and XIb,all but twenty-seven falling in the high 
school range. Eleven different schools were represented. The only 
special requirement was that no pupil was to have studied French. The 
cooperating teachers divided the pupils into two approximately equal 
groups by means of alphabetical sampling. The groups participated in 
the experiment simultaneously, occupying different rooms. Group A 
was given Form A, and Group B, Form B. The time allowed for the 
learning was twenty minutes. No writing of any sort was permitted, 
and there was no intimation that a test was to follow. However, it is 
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TasLe II.—VisvaL VocaBuLary—Form B 
SMS oS ok Ol oe hee au-dessus 26. know.................. connaitre 
| RE IR Src en 28 bébé IRs oat ccc dass ves caed débarquer 
PGR EERS ca eeapgemigee or étre MRL Sas hey osc sb ery sor apprendre 
SRI tie Pate ei" boite MG a bk S605 0s sees Va emprunt 
De I is es cesa se eeuen casser OS Eee aimer 
Rag Saar ie cing rar brun NN sacks a bs Gan bate be bas 
WOM cos ives ets acheter WP 2s iN 5a faire 
I she Sis ee es centre Me WR ola ivaes wets eee ensuite 
ai is os see eek bee blé ree Po ee Le autre 
BU NS 68 oon ag dig wien Sau couvrir eh OO x 0k caves Bea ees dehors 
Bi. eeeiere. . «5 ss ss ks départ NS Svs sk aed oma h onen devoir 
Se ss i ons eV chon désirer Rf orci as 5 phous ed content 
ES aire meee diviser Se Nc 6hsc su eek ess Saas courir 
8 ai Aa pee er oe chien WM, 5.9% 0 ihc aee bobecien dire 
Ee ee Angleterre 40. send................... envoyer 
EE ee ee re assez Doha sak oS wwe es ba coudre 
OS SEE ines rencee ea bs beac fils 
re ree Oe. MO es haccccu seb es dw nen état 
Sie Te ana ra ar donner UE MNG 5 a 508 0os keke ewes arrét 
MG ts ae kd o's alee aller NN 4s haa ot oes sees complet 
EEO RE ee croitre SN 6 ik 64 ei Sia ca doux 
I cc a cb cas se ashne avoir MURS UL fac wic od shee sa ccaes cravate 
EL kee bat tee eda chaleur I ay ses vad 4 3h eke chaud 
i ear | FRR aris eee: eau 
MS vies eset esa Ces espérer -s«-—«sS0. window................ fenétre 


the belief of all the teachers that a good rapport was established and that 
the pupils were well motivated to do their best. The process was some- 
thing of a competitive game, taking on additional interest through its 
novelty. 

Immediately following the learning period a specially designed 
recall type of test was given the subjects. This test (see Table III) 
was the same for both groups. Column I, consisting of twenty-five 
items, presented French words for which English equivalents were to be 
inserted on the liries, while the twenty-five items of Column II presented 
the English word and called for the insertion of the French equivalent. 
The words in each column were drawn from the original learning series 
of fifty words in such a way as to secure a comparable sampling free from 
location or serial association. A period of fifteen minutes was allowed 
for this test. This arrangement automatically provided the combina- 
tions of learning and testing necessitated by the various aspects of the 


problem previously listed. The psychological phases may now be 
taken up in order. 
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Tas.Le III.—Trstr or Visuat VocaBuLARY 
_ Instructions: No matter which way you learned the words, fill in all the blanks | 
you can for both columns. You have fifteen minutes. ) 
Co.umn I Coxtvumn III 
Give the English for: Give the French for: ) 
SS So cick cha cease ak h. O,. ... coe. ss cock geesueeeeee | 
Pe So i SSRN CTA Me NOUN bbs von hic b¥oc eek ei ) 
SN es cca ore pes eek RWS eae | 
NOE os ski a oc WOR eee bk ps ¥a'e5 GMS 60. Sires eee | 
go alte Sty cee eae aes a's Dir i vids icc ka kivscs Bee tz 
ERR SIA Sc era Reg ia os Gioia c bien chase Ba ) 
eS oe ce 5 oak Wa Siete da a oe Oh We goes 6 ws dadha ccs podpen bee 
RR re apts Sk eee Se ish. cud sc tecwacswtcss sche 
UE oi Sess > ha cea eee ke hess SP ik teiieissiiens dee teeene 
ONE 6a ios <cn'e bk Wales wares wees DID 5 ob Sos cscs deh cue cee eee 
SG. uss Ke dd dabakerd ace cin Besa ess aews sds Ce alcaeees oe 
RP eee is PPLE EY, SL Sos, so adie sx bac Als ce . 
Se SEIT is 0.6 0s sobbed bes ehigw a4 Bs a dias x 6's sb w eltice gu bnnec teen 
De SNS. . ce eSu undoes wean WG o's satan pain seediaou ween 
| ERR ren Ao a ee RIGS 34552 6} 0 aos spas sega Cea 
sé oc. css Ridden saee ik ee Bi 695 Ss Bish vee sees hc ee 
RR ens. xe ase Bea Sn 's inv trad ob oraeaeld ace 
SPRITES DI SET OA OT Ws MSs cu a's 3 bo'cae sec ees CR 1 
SS RR OU pr aaa UE INES 5's oe Sudo G0 wee Cee ae ele 
I Scans th Xow kee eee as IN b o's.b Ok Waco oc'e a hcolione Aca 
RL AND Shins cd ue ace eee been dies ye ER Rane pare ys Gal YS 
rr a eae re years SU MS ioc. dock. vas chet dewau Benen 
es NI aie i ie 5 Ba tek cones OP eee, eer 
Re Ce eee ree Ne a aiee: Sen. bint eel SW ee. de 
Ns chs ae cas aes wee ees he woes ad WSs 3 ao: pacce a, biciboa Fibs ws. 6-0: bee 
1. Number of Associations.—The three hundred twenty-eight pupils pr 


comprising both groups secured a mean score of 14.1 in Column I 
(unfamiliar-to-familiar), 7.0 in Column II (familiar-to-unfamiliar), 
That is to say that, when the learning order 
is partialed out (in this case.by cancellation), the ratio of equivalents 
for French words to equivalents for English words is two to one. 
familiarity factor here is a function of the test only, not of the practice 


and 21.1 in both columns. 


period. 


2. The Effect of Practice Order.—In so far as Group A and Group B 
were tested by Column I and Column II respectively, a measure of the | 
effect of retaining or reversing the practice order is obtained. Column ‘I 
I of the test is the direct pratice order for Group A and the reverse 
If the order of learning makes a difference 


practice order for Group B. 






















The 
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we should expect Group A to have the advantage—and that is the way 
it comes out. In Column I the mean for Group A is 15.1, fer Group 
B, 13.1. This difference of two points is five times the PE of difference 
between the means, and is therefore significant. Similarly Group B 
should surpass Group A in Column II since the recall order here is the 
same as the learning order for Group B. This, too, proves to be the 
case. The mean for Group A in Column II is 6.0 and for Group B, 
8.0—a difference between means which is six times its PE. 

3. The Familiarity Factor in the Practice Period.—All the learning 
of Group A proceeded in the order unfamiliar-to-familiar, while all the 
learning of Group B proceeded in the order familiar-to-unfamiliar. Are 
there differences in the effectiveness of learning, as measured by 
immediate recall, which are ascribable to this one factor? To make a 
valid comparison here it is necessary to control the factor of practice 
order—already shown to be of consequence. This may be done simply 
by using only the mean scores for the sum of the two columns. The 
mean for Group A was 21.1, for Group B, 20.9. Since this difference is 
only .14 of the PE of difference between the means it is of no significance. 
The practice order appears to be the deciding influence in determining 
the total number of associations made. 

4. The Combined Weight of Practice Order and the Familiarity 
Factor.—It appears that such a combination would not add significantly 
to the findings for practice order alone. However, with respect to the 
stimulus word in the test, the unfamiliar French calls up (for both 


_ groups) a greater number of associations than the familiar English. 


Thus even Group B, which studied only English to French, obtained a 
mean of 13.0 equivalents for the French stimulus words in the test, 
and a mean of only 8.0 for the English stimulus words. 

5. Individual Differences —The total scores ranged from two to 
fifty out of a possible fifty. There was one perfect score in the three 
hundred and twenty-eight cases. The standard deviation was 9.0. 
It will be recalled that the mean was 21.1. It is clear that there exist 
very large individual differences among high school students in the 
skill represented in this experiment. 


SUMMARY 


Some conclusions from these results are the following: 


1. Wherever translation of French words is the important considera- 
tion the learning order should be French-to-English. 
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2. In the learning period the sequence of association significantly 
affects the recalled products. 


3. In the testing period the more fertile stimulus is the unfamiliar 
one 


4. There are wide differences among students in ability to establish 
word associations. 
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THE MEASUREMENT OF ACHIEVEMENT IN 
GENERAL INORGANIC CHEMISTRY 


VICTOR H. NOLL 


University of Minnesota 


At the end of their second quarter of instruction in General Inor- 
ganic Chemistry in the University of Minnesota a group of fifty-four 
students were given a comprehensive objective examination of one 
hundred thirty-five items based on the work of the course.!. After com- 
pleting the third quarter of instruction in the subject, these students 
were again given the same examination. Twelve of these students were 
women, the rest were men. Most of the men were in pre-medical or 
pre-dental courses and the women were students in physical education 
or the Arts College. None had had chemistry in high school. When 
they took the examination the first time they had had about twenty-two 
weeks of instruction consisting of three hours of lectures and four hours 
of individual laboratory experimentation per week. At the end of 
the third quarter they had had thirty-three weeks of instruction. 

This testing procedure provided some data of value for several 
purposes. First, it gave an objective measure of the improvement of 
this group of students through their third quarter of instruction.? 
In the second place, it afforded an opportunity to study the relative 
effectiveness of different methods of handling testscores. It is with this 
second problem that this paper deals. The discussion to follow presents 
an evaluation of three kinds of scores: (1) The total raw score; 
(2) raw scores based upon fifteen scale items; (3) weighted scale 
scores and their use as measuring and predicting devices. 

In order to attack the problem more effectively, the examination 
was scaled according to Van Wagenen’s technique,’ as follows: From 
over six hundred of these examinations as taken by students in General 
Inorganic Chemistry at the University of Minnesota, a random sampling 





1 The reliability of this examination, as measured by correlating scores based 
on alternate items and applying the Spearman-Brown formula, was found to be 
.86 + .02. It also gave a correlation coefficient of .71 + .03 with scores in the 
Iowa Placement Examination for measurement of Chemistry training. The 
examination, therefore, can be considered a fairly reliable and valid measure of 
achievement in General Inorganic Chemistry. 

2 This phase of the problem will be only incidentally discussed in the present 
paper. 

3 Van Wagenen, M. J.: Historical Information and Judgment in Pupils of 
Elementary Schools. New York. Contributions to Education, No. 101, Teach- 
ers College, Columbia University, 1919, Sections V and VI, pp. 39-50. 
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consisting of one hundred papers was taken. Then the number of 
correct and incorrect responses on each of the one hundred thirty-five 
items was tabulated for the one hundred papers. This gave a measure 
of the relative difficulty of each item in the examination. On the 
basis of this tabulation the PE values, corresponding to the degree of 
difficulty of each item in the examination, were found. Finally, fifteen 
items were selected for the scale, varying in difficulty from +.612PE 
to —2.188PE and at intervals from each other of approximately .2PE. 
These items, arranged in order of difficulty, were divided into three 
groups, five hard, five medium, and five easy. The median value of 
9.7 median deviations or PE above just not any ability in General 
Inorganic Chemistry was arbitrarily assigned to the third most difficult 
item. The most difficult item, therefore, had a value of 10.1 Md. 
Dev. or, more conveniently, 101, and the easiest, a value of 7.3 Md. 
Dev., or 73, the difference between each item and the succeeding one 
being .2 Md. Dev., or PE as has been said above. 

Both examinations for each student were scored in three differ- 
ent ways. First they were scored on the basis of all of the one hundred 
thirty-five items which gave the total raw scores; second, raw scores 
based only on the fifteen items used in the scale were obtained; third 
the weighted scores on the scaled items were obtained. The highest 
possible total raw score is one hundred thirty-five; the highest possible 
raw score on the scale items is fifteen; and the highest possible weighted 
scale score is one hundred seventeen. Thus six scores were obtained 
for each student, 7.e., first and second total raw scores, first and second 
raw scores based on scale items only, and first and second weighted 
scale scores. For various reasons but forty-six of the fifty-four cases 
could be used throughout, so all results are based on these forty-six 
cases. In Table I is shown the data for these forty-six students. 

Let us first examine the three types of scores as measures of group 
achievement. Inspection of Table II shows a very large grain of the 
group in terms of total raw scores, a great deal smaller one in terms of 
raw scores based on scale items only, and one intermediate between these 
two in terms of weighted scale scores. In the first case, the gain is 
more than one standard deviation (first testing), in the second case, 
about one-half a standard deviation (first testing), and in the third case 
about four-tenths of a standard deviation (first testing). Applying the 
formula from Kelley,’ : 


caiyy Means = 1+/o?Mzx + o*My — 2roMaxoMy 


1 Kelley, T. L.: “Statistical Method.”” New York: The Macmillan Co., 1923, 
p. 182. . 
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Taste I.—Raw Scorss, Scare Scorges, WeIcHTED ScaLe Scores AND GAINS FOR 








Total raw scores 


Raw scores based on 


fifteen scale items only 


Weighted scale, scores 











Number of student 
First | Second} Gains | First | Second| Gains | First | Second| Gains 

1 65 85 20 4 7 3 76 85 9 

2 69 87 18 2 7 5 67 89 22 

3 70 96 26 8 12 4 87 106 19 

4 70 69 - 1 4 5 1 79 83 4 

5 70 75 5 4 6 2 77 85 8 

6 71 59 —12 6 6 0 83 82 —1 

7 72 86 14 6 6 0 86 82 —4 

8 73 91 18 8 i) 1 89 93 4 

9 73 97 24 4 9 5 74 89 15 
10 73 84 ll 6 9 3 83 91 8 
ll 75 96 21 a 9 0 93 94 1 
12 76 92 16 7 8 1 87 91 4 
13 76 95 19 5 8 3 82 89 7 
14 76 99 23 5 9 4 79 89 10 
15 78 89 11 7 7 0 87 86 —1 
16 78 83 5 5 7 2 81 85 4 
17 79 101 22 8 8 0 89 89 0 
18 79 102 23 8 9 1 89 91 2 
19 79 79 0 4 5 1 76 78 2 
20 80 90 10 5 8 3 81 89 8 
21 80 101 21 8 8 0 89 89 0 
22 80 89 9 8 8 0 89 90 1 
23 82 72 —10 7 5 -—2 87 81 —6 
24 84 90 6 7 8 1 85 91 6 
25 84 88 4 11 1l 0 103 99 —4 
26 85 107 22 5 8 3 80 89 9 
27 87 101 14 7 Gg 2 87 94 7 
28 88 108 20 8 10 2 89 95 6 
29 89 95 6 10 1l 1 101 100 -1 
30 89 99 10 9 ll 2 93 100 7 
31 91 98 7 7 5 —2 85 80 —5 
32 92 115 23 6 12 6 87 102 15 
33 93 101 8 8 10 2 89 94 5 
34 93 106 13 10 10 0 98 94 —4 
35 95 111 16 11 10 -1 103 97 —6 
36 95 117 22 9 12 3 92 102 10 
37 95 107 12 « 10 8 —2 97 89 —-8 
38 96 116 20 10 ll 1 95 99 q 
39 97 124 27 10 13 3 96 104 8 
40 97 110 13 9 il 2 89 98 9 
41 102 118 16 9 10 1 94 95 1 
42 102 99 = 3 7 8 1 89 87 —2 
43 103 108 5 11 12 1 100 102 2 
44 106 109 3 12 11 —1 105 99 —6 
45 107 117 10 12 11 ~-1 100 100 0 
46 110 120 10 13 12 -—1 104 100 —4 
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TasBLe II.—Means, Stanparp Errors, AND STANDARD DEVIATIONS FOR THE 














Group, N = 46 
M Standard | Standard 
eans ae 
errors deviations 

First total, raw scores.................. 84.9 1.72 11.65 
Second total, raw scores................. 97.4 2.11 14.35 
Ws hs hod 648 6 Sah ig Cees sho; 12.5 1.50 
First raw score, scale items only......... 7.6 .36 2.49 
Second raw score, scale items only........ 8.9 31 2.13 
NG a a Cha Gaulite SR RROWAN SEVER Coes be 1.3 .245 
First weighted scale scores.............. 88.7 1.26 8.60 
Second weighted scale scores............. 92.4 1.04 7.05 
Pc ond 5 uk ote sed he acke 3.7 .99 








we obtain the standard errors, or significance of these gains. These 
are shown in Table II. In total raw scores, the gain is 8.26 times its 
standard error. This means, for all practical purposes, absolute ‘cer- 
tainty that the gain is a true one. In raw scores based on the 15 scale 
items, the gain is 5.3 times its standard error or again almost certainly 
a true difference. With the weighted scale scores, we find a gain of 
3.75 with a standard error of .99 or a gain of 3.78 times the standard 
error, which means that the chances of the gain being a true one are 
9,998,128 out of 10,000,000. It is worth while to note here that the 
average gain as shown by the total raw scores is by far the largest and 
most significant from a statistical point of view, the weighted scale 
scores’ gain being smallest and least significant. However, all three 
are, for practical purposes, very significant. 

The real purpose of a scale seems to be that of measurement and 
prediction in individual cases. If all the time and labor of scaling 
and weighting the items of a test is to be justified, it must be on this 
basis. An attempt was thus. made to evaluate the three kinds of 
scores here obtained, as measures of improvement and as means of 
predicting success in individual cases. The first step was to obtain 
the probable error of single measures of each type, using the Otis Median 
Difference formula.' 


PE (of a single measure) =v xX Med. Diff. 


1 Otis, Arthur §8.: “Statistical Method\in Educational Measurement.” New 
York: World Book Company, 1925, p. 250. 
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Median difference represents the median of all the differences 
between the first and second scores in every case, when these differences 
are arranged in a frequency table. 

The PE of a single total raw score is found to be +9.78, which 
means that the chances are even that a student’s true score will not 
differ from his obtained score by more than 9.78 points. The PE 
of a single raw score based on scale items only is 1.15 and that of a 
single weighted scale score is 3.90. A study of individual gains shown 
in Table I reveals some interesting and surprising facts, as follows: 

1. In terms of total raw scores, not one student makes a gain which 
is three times the PE of a single measure and only 30.4 per cent make 
gains which are twice or more than twice as great as the PE. 

2. In terms of raw scores based on the fifteen scale items, only 
five students out of forty-six make a gain which is three or more times 
the PE of a single measure, and only twelve out of forty-six or 26.0 
per cent make gains which are two or more times the PE. 

3. In terms of weighted scale scores, just four students make gains 
which are three or more times the PE of a single measure and only 
thirteen out of forty-six or 28.2 per cent make gains two or more times 
the PE. 

Aside from the implications of the above facts for the efficiency 
of our instruction of college freshmen (and this is probably not confined 
to the University of Minnesota), there seem to be some points bearing 
on the present problem. There is almost exact agreement between 
weighted scale scores and raw scores based on scale items as to signif- 
icance of gains. Even the total raw score gains, which are two or more 
times the PE, agree very well with the other two measurements. There 
is apparently no advantage to be gained here by weighting the items of 
a test, since we obtain practically the same results whether we weight 
them or not, as far as reliability of single measures is concerned. 

In order to compare these methods of scoring with respect to pre- 
dictive value, the coefficient of correlation (Pearson Product Moment) 
was calculated between first and second scores obtained by each of the 
three methods. These are shown in Table III. To obtain some 
measure of the predictive value of the scores in individual cases, the 
Kelley formula for the coefficient of alienation! was applied. 


k= +/1—?r? 








1 Kelley, T. L.: Op. cit., p. 173. 
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The coefficient of correlation between first and second total raw 
scores was found to be .728, which gives a value for k of .685. This 
means that the total raw scores are better than a random guess for 
predictive purposes in three hundred fifteen cases out of a thousand. 
The coefficient of correlation between first and second raw scores based on 
the scale items was .751, which gives a value for k of .436. This means 
that for predictive purposes these scores are better than pure guess in 
five hundred sixty-four cases out of a thousand. The value of r 
for the weighted scale scores is .643 and the value obtained for k is .765, 
which means that prediction from weighted scale scores is better than 
guessing in but two hundred thirty-five cases out of a thousand! We 
have here evidence of a decided advantage in prediction for the raw 
scores based on scale items. Also, the total raw scores seem superior in 
this instance to the weighted scale scores. The difference in favor of 
the total raw scores might be ascribed to the greater number of items 
and hence increased reliability, but this certainly does not explain the 
clear superiority of the raw score-scale-items method. 

In addition to the correlation coefficients cited above, additional 
ones were calculated as indicated in Table III. 


TaB_Le III.—Correxations! 














Variables correlated r PE of r 
1. Total raw scores—first scores with second scores......... .728 .046 
2. Raw scores on scale items—first scores with second scores; .75l1 .043 
3. Weighted scale scores—first scores with second scores... . . .643 .058 
4. First total raw scores with first raw scores on scale items| .691 .052 
5. First total raw scores with first weighted scale scores... . . 741 .045 
6. First raw scores on scale items with first weighted scale 
RGN ES OSes ce eat Vater we eee ied ted .973 .005 
7. Second total raw scores with second raw scores on scale 
OA. |... 3o:s siahoaued eaten cnkckistsst eae .809 .034 
8. Second total raw scores with second weighted scale scores| .678 .053 
9. Second raw scores on scale items with second weighted 
Nn ee oe ad aleaeee aera ee .935 012 
10. Total raw score gains with raw score-scale items-gains....| .482 .076 
11. Total raw score gains with weighted scale score gains..... .537 .069 
12. Raw score scale items gains with weighted scale score gains| .885 .021 





1 All correlations have been calculated by Pearson Product Moment formula. 
Correlations 1, 3, 5, 8, and 11 were calculated from grouped data; Nos. 2, 4, 6, 7, 
9, 10, 12 were calculated from ungrouped data. 











at ea 7 





464 The Journal of Educational Psychology 3 


For convenience in studying these correlations, they may be con- 
sidered in four groups of three each. The first group of three represents 
the correlation of first with second scores of each type. Here we find 
a correlation between first and second raw scores on scale items which is 
more than ten points higher than that between first and second weighted 
scale scores. It is also slightly higher than that between first and 
second total raw scores. If these correlations are any indication of the 
reliability of the scores obtained by the various methods, there seems 
little question as to the best one. 

In the second group of three correlations (4, 5, and 6) between 
first scores obtained by the various methods, we have substantial 
correlations between first total raw scores and first scores by the other 
two methods. However, between first raw scores based on scale items 
and first weighted scale scores, we have the very high correlation of 
.973, which represents almost perfect agreement. If weighted scale 
scores are measuring what we want to measure, there is no evidence here 
that raw scores based on these scale items do not measure it almost 
equally well. 

In the third group of correlations (7, 8, and 9) we have expressed 
the amount of agreement between the second scores of each type. 
We find least agreement between second total raw scores and second 
weighted scale scores and again highest correlation between raw scores 
on scale items and weighted scale scores, the latter coefficient being 
.935. As before, we have very little evidence of any advantage to be 
gained by weighting the items of a scale. 

Finally, in the fourth group of correlations, we have expressed the 
amount of agreement between gains as measured by the three methods 
for the forty-six individuals. We have here correlations of total raw 
score gains with the other two types, of .482 and .537 respectively. The 
correlation between raw score gains based on scale items and weighted 
scale score gains is .885, which again indicates very high agreement 
between these two types of measurement. 


SUMMARY 


The comparison of three methods of scoring an objective examina- 
tion in General Inorganic Chemistry suggests the following points: 

(a) With respect to measurement of group improvement, there is 
little to choose. Each type of measurement shows a significant gain 
for the forty-six students us a whole over eleven weeks of university 
instruction. 
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(6) The probable errors of single scores vary widely in amount; 
but gains in the three types of scores, when expressed in terms of their 
respective probable errors, agree very closely. The agreement between 
total raw score gains and the other two types, expressed both as per 
cent of students making a certain gain and by the correlations between 
gains, is lowest. 

The agreement between gains expressed in terms of raw scores 
based on scale items and gains expressed in terms of weighted scale scores 
is high. Because of this high correlation (.885) and the low ones men- - 
tioned above (.482 and .537), we may conclude that the gains expressed 
in terms of raw or weighted scale scores are more likely to be representa- 
tive of the true gains than those expressed by total raw scores. This 
assumption is not entirely justified by factual data but seems reasonable. 

(c) For purposes of prediction, as expressed by the coefficient 
of alienation, the raw scores based on scale items are best, the weighted 
scale scores being least valuable. The total raw scores seem slightly 
better than the weighted scale scores in this respect. It is interesting 
to note that the reliability of the total raw scores based on only one 
hundred thirty-five test items is less than that of raw scores based on 
only 15 scale items. It seems that the differences expressed by the 
latter scores are being covered up by a large number of items which are 
practically worthless as far as differentiating between levels of ability is 
concerned. Another interesting question arises here as to the difference 
in reliability which might (or might not) be found if the-fifteen scale 
items were given alone instead of being given as part of the entire 
test, scattered as they were among many other similar items. 

(d) Finally, there seems clear-cut evidence that weighting items of 
a scale in the present case adds practically nothing to its value in any 
respect, and in some cases (notably in reliability and with respect to the 
coefficient of alienation) actually detracts from its usefulness. 

However, scaling itself, without weighting, seems to add sub- 
stantially to the value of a test. It is probable that selection of fifteen 
items at random from a large flumber of test items would give a scale 
without the trouble of scaling at all, providing that there were equal 
numbers of items at each level of difficulty in the total collection 
of test items. However that may be, there is no evidence in the present 
instance to justify the use of weighted values for the items in this 
scale. 
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HOW EFFECTIVE IS SPECIFIC TRAINING IN 
PREVENTING LOSS DUE TO THE SUMMER | 
VACATION? 


L. D. MORGAN 
Kansas State Teachers College 


Kirby! has shown the effect of specific training in addition and 
division. He found that in the spring it required approximately thirty 
minutes of drill in addition to bring the group to the same level of 
efficiency as it had attained the previous fall by seventy-five minutes of 
drill. In division, “more than three-fifths as much practice was 
required to regain the standing reached in the experiment as was 
required at that time to reach it.’’ Nelson? in a recent study has shown 
that it required considerable time to again attain the level of efficiency 
reached in the previous spring in most subjects. Bruene,’ has likewise 
shown the detrimental effect of the summer’s vacation upon the funda- 
mentals of arithmetic, reasoning problems, and spelling, while there is 
an increase in efficiency in reading. This was true in Grade IV. In 
Grades V and VI there was an increase in efficiency in reading and 
nature study, while there was a loss in efficiency in the fundamental 
processes, problem solving, history, language usage and spelling. Other 
writers such as Elder, Kramer,® and Patterson,® have also shown the 
effect of the summer’s vacation upon efficiency in school subjects. 

The present study is concerned with two questions: (1) The effective- 
ness of specific training in preventing loss in efficiency due to summer 
vacation. (2) The significance of specific training. 





1Kirby, Thomas J.: “Practice in the Case of School Children.’’ Teachers 
College, Columbia University Contributions to Education, No. 58, 1913. 

2. Nelson, M. J.: How Much Time Is Required in the Fall for Pupils in the 
Elementary School to Reach Again the Spring Level of Achievement? Journal of 
Educational Research, Vol. XVIII, Nov., 1928, pp. 305-308. 

*Bruene, Elizabeth: Effect of Summer Vacation on the Achievement of Pupils 
in the Fourth, Fifth and Sixth Grades. Journal of Educational Research, Vol. 
XVIII, Nov., 1928, pp. 309-314. 

4 Elder, H. E.: The Effect of Summer Vacation of Silent-reading Ability in 
Intermediate Grades, Elementary School Journal, Vol. XXVIII, March, 1927, 
p. 541. 


5’ Kramer, G. A.: Do Children Forget during Vacation? Baltimore Bulletin, 
VI, December, 1927, pp. 56-60. 


6 Patterson, M. V. W.: The Effect of Summer Vacation on Children’s Mental 


Ability and on Their Retention of Arithmetic and Reading. Education 36, Dec., 
1925, pp. 222-228. : 
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This study was carried on in a city in Southeastern Kansas. Two 
Grade VI classes were used designated in the study as X and Y. The 
same teacher taught both groups in the subjects considered in this study. 
The following tests were given to both groups on May 11, May 25, and 
September 4. Compass Survey Test in Arithmetic, Form A, Thorn- 
dike-McCall’s Reading Scale, Form 8, and Otis’ Reasoning Test in 
Arithmetic, Form A. To one group, designated as Y, special training 
was given in the fundamentals of arithmetic, silent reading and prob- 
lem solving. This training lasted for a period of two weeks. The 
training in the fundamentals of arithmetic consisted in administering 
four diagnostic tests, one each in addition, subtraction, multiplication 
and division. Then the weaknesses revealed in these diagnostic tests 
were followed by remedial teaching, which consisted in using the 
Economy Remedial Exercise Cards. Ten minutes practice daily for the 
period was devoted to this work. 

In reading, mimeographed material similar to that used in the scale 
was used, specific questions were asked daily on the material read, 
and three forms of the same test used. Part of each recitation period 
consisted in revealing to the pupils the correct answers to the questions 
asked, and they were led to see that the answers were to be found in 
the material read. 

In problem solving, an analysis was made of the last ten problems 
in the test, and then five similar problems were made for each of these 
ten problems given in the test. The pupils were asked to pay attention 
to the following things: (1) To understand each word in the problem, 
(2) to determine what is given in the problem, (3) to determine what is 
required, (4) to select the different processes to be used in solving the 
problem, (5) to lay out the solution by asking: (a2) What do I have 
given? (b) What amItofind? (c) How am I to use what is given to 
find out what Iam to prove? (d) How can I check my answer to prove 
that it is correct? (6) How to check the answer. For each problem 
every pupil was required to have his plan checked as given in (5) above 
before he was permitted to proceed in solving the problem. After his 
plan was approved, he solved the problem, and then he was required to 
check his answer. 3 

After these two weeks of training, the same tests were given to 
both groups. This was the last week of school before the summer 
vacation began. More of the pupils in either group attended summer 
school. On September 4, the same tests were given again. Neither 
the teacher nor the pupils knew that the tests were to be administered 
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when school convened after the summer vacation. 
results are given in the tables following: 
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A summary of the 


















































TABLE I 
Fundamentals of : - 

Group N lini | Reading Problem solving 
May | May Sept. May May Sept. May May Sept. 

1l | 25 4 11 25 4 11 25 4 

| 

PO uate ds x 40 27.88 | 35.98 | 33.42 18.50 | 22.75 | 23.03 | 8. a 11.13 | 10.90 
ee ea ce 38 25.79 | 26.00 | 22.31 | 18.65 | 18.94 | 19.66 | 8.47 8.86 8.71 
Difference..|.,...... | 1.09 | 9.98 ' 11.13 | —.15 | 3.81 3.37 | . 26 2.27 2.19 











It will be seen that group Y was superior to group X in the initial 
test in the fundamentals of arithmetic by 1.09 problems, and in problem 
solving ability .26 problems, while group X was superior to group Y in 


reading ability by .15 questions. 


By two weeks of training in the fundamentals of arithmetic, group Y 
increased in efficiency 8.10 problems while group X increased only .21 


problems. 
X increased only .29 questions. 


In reading, group Y increased 4.25 questions, while group 
In problem solving, group Y increased 









































2.43 problems, while group X increased only .39 problems. Table II 
TaB_e II 
Fundamentals of Arithmetic 
| | may waar Difference due 
Group AM gain; PE,y | SD | PEais;. slesifieunes | to specific 
| training, AM 
| 
Y 8.10 | .973| 9.13 | 
x 91 789 8 04 1.252 | 6.302 7.86 
Reading 
| l l 
Y | 4.25 | .794| 7.45 
a ae | pe | donee | 3.526 3.96 
| | 
Problem Solving 
| | 
Y | 2.43 .401 3.76 | | | 
cr ae 341 | 3.12 | ee 2.08 
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will show gains made in AM’s (arithmetic means) in the various subjects 
during the two weeks of training, and the index of significance of these 
gains. 

It will be seen that group Y increased its gain over group X: (1) 
In the fundamentals of arithmetic by 7.86 problems, (2) in reading, 
3.96 questions and (3) in problem solving by 2.04 problems. 

We now come to the second question: The effect of the summer 
vacation upon the two groups. Were the two groups equally affected? 
In the fundamentals of arithmetic group X lost 3.69 problems while 
group Y lost only 2.56 problems, or a difference of 1.13 in favor of 
the Y group. In reading, however, group X gained .72 questions while 
the Y group gained only .28 questions or a difference of .44 in favor 
of the X group. In problem solving, group X gained .05 problems, 
while group Y lost .23 problems. Table III presents a summary of the 
results. 





























TasieE III 
F ape = err a of Reading Problem solving 
Loss of gain by vacation tea 
Gain Loss Gain Loss Gain Loss 
ose Se : ‘i 
cS Seater eee Ear 3.69 | .72 | 05 
OS IE ere | vee 2. i .28 | J ote .23 
Difference........... | 1.13 44 | | wee | .28 











For the Y group, thirty-five pupils lost in efficiency during the 
vacation, three remained the same, while two gained in efficiency in the 
fundamentals of arithmetic. In reading, sixteen pupils gained in 
efficiency, twelve remained the same, while twelve lost in efficiency. In 
problem solving, nine pupils gained in efficiency, fifteen remained the 
same, while sixteen lost in efficiency. 

For the X group, in thefandamentals of arithmetic, thirty-five 
pupils lost in efficiency, one gained in efficiency and one remained the 
same. In reading, twenty-five increased in efficiency, five remained the 
same, while eight lost in efficiency. In problem solving, twenty pupils 
lost in efficiency, seven remained the same, while eleven gained in 
efficiency. The results are given in Table IV. 

Ten poorest students in problem solving in the group X, lost in 
efficiency during the vacation, while in the Y group the poorest students 
gained and the three next remained the same. 
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TaBLe IV 
ms a of | Reading | Problem solving 
Gain | Loss | Same | Gain | Loss | Same | Gain | Loss | Same 
Y 2 35 | 3 16 12 12 9 16 15 
xX 1 36 | 1 25 8 5 ll 20 7 





























It might be asked, just how much value was the two weeks specific 
training in the two subjects? The results are given in Table V. The 
results in the fundamentals of arithmetic are only approximate, because 
the test norms do not express scores by months, so it was necessary to 
approximate the gains in this instance. 








TABLE V 
| Fundamentals of | Reading, gain, Problem solving, 
Group | arithmetic, gain, months gain, months 
months | ; 
TaN Oe ep | 7 20 17 
es Ore ea | 2 1 1 or less 
Difference due to | 
SIS 6 aca aes | 5 19 16 











The loss or gain due to summer vacation may also be expressed in 
months, as given in Table VI. 











TABLE VI 
ss | F weenie? of | Reading | Problem solving 
jroup RREG oaces 
| Gain | Loss | Gain | Loss | Gain Loss 
ee ee aes gia | 7 Te i eee | 
_ SES 25 | 2 | 1.5 
Difference.......... | 1.0 | 1 | | 1.5 











The following implications may be drawn from this study: First, 
that two weeks specific training is productive of greater efficiency in 
the three subjects considered. The greatest gain was made in reading, 
equivalent to about twelve months. In problem solving, the increase 
was seventeen months, while in the fundamentals of arithmetic, it was 
seven months. Both groups were rather low in efficiency in the funda- 
mentals of arithmetic at the beginning of the experiment, and even by 
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specific training were not able to attain the norm established for the 
high Grade VI. In reality these groups were practically seventh 
graders. However, the writer would not have it implied that such 
training would be productive of equal increments in efficiency if 
carried on indefinitely. Second, that the loss for group Y exceeded that 
of group X, in problem solving. The greater loss may have been due 
to the shortness of the training period. Third, that in the fundamentals 
of arithmetic, where mere skill is involved, group Y, did not lose to the 
same degree as group X. Fourth, that in reading, both groups increased 
in efficiency over the vacation, but group X, had the greater increase. 
Group Y, may have attained its approximate maximum of efficiency 
during the training period. Fifth, that the specific training acted as 
sort of a “buffer” to prevent loss of material previously learned. For in 
an analysis of the errors made before and after the vacation period, it 
was found that seventy-seven per cent of the errors were identical. 
The writer is planning to repeat the experiment in Grades II, IV and 
VI, using two hundred pupils in each group for each grade. Probably 
a larger sampling may produce different results. 
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Educational Psychology; An Introductory Text, by R. Pintner. New York: 
Henry Holt and Co., 1929. Pp. XIV + 378. 


Trained under Professor Saintsbury in the Department of English 
of Edinburgh University, Professor Pintner does credit to his teacher. 
The book under review is written in a most engaging and effective style. 
This, to the reviewer, is its most outstanding characteristic. Secondly, 
the subject-matter, though conventionally divided into two roughly 
equal parts under the heads of Original Nature and The Modification 
of Original Nature, departs quite radically from the usual selections 
made by writers on Educational Psychology. Professor Pintner 
believes that Educational Psychology should be of practical use to the 
classroom teacher and has selected his material accordingly. There 
is very little space devoted to reflexes or instincts, and none at all 
to the sense organs and nervous system, although he acknowledges 
that behavior is dependent upon inherited mechanisms. Instead, the 
book fairly bristles with tests and experimental procedures, not only 
for the measurement of intelligence and achievement, but also for the 
rating of personality traits, character traits and moral traits. Since 
he occupies such an important position in the educational world his 
selection will undoubtedly influence profoundly the pedagogy of the 
subject. Thirdly, the book is well printed, well documented, and 
abounds in clear diagrams. 

Since the selection of material is empirical, depending upon the 
judgment of the author as to its fitness for inclusion, there is a consider- 
able lack of continuity in the treatment. One can almost start it 
equally well at any chapter. In reading it, one constantly challenges 
the decision to include this topic or that, and feels that the author’s 
next selection may be something totally different. Perhaps at this 
stage of the development of Educational Psychology any undue crys- 
tallization is to be deprecated. 

The text is planned as an introductory text and will undoubtedly 
be widely used in normal schools and other institutions where teachers 
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are trained. Consequently, although the volume goes back to the 
laboratory for most of its material, only the conclusions find a place. 
The details of statistical data are of necessity omitted. The reviewer’s 
experience is that students find summaries and conclusions, be they 
ever so skilfully presented, as they undoubtedly are in this text, far 
harder to grasp than a very great deal of factual evidence. It will be 
interesting to discover if beginners really find the book an easy one. 
But this is a digression. Let me close the review by saying that Pro- 
fessor Pintner has written a most important and probably epoch-making 
book—one that thoroughly deserves a rich success. 


PETER SANDIFORD. 
University of Toronto. 





An Historical Introduction to Modern Psychology, by Gardner Murphy 


with a Supplement by Heinrich Kluver. New York: Harcourt, 
Brace and Co., 1929. Pp. XVII + 470. 


“From color-theories to defence-mechanisms, from the functions 
of a white rat’s vibrisse to the mystic’s sense of unutterable revelation, 
from imaginary playmates to partial correlations—wherein lies that 
unity of subject-matter which leads us to speak, compactly enough, 
of ‘contemporary psychology?’ ” Such is the opening sentence of this 
book. Offering an answer to this question, Murphy suggests that the 
unity may be found in an historical approach. He therefore outlines 
the history of psychology in three periods: (1) Pre-experimental up to 
the middle of the nineteenth century, (II) (psycho-physiology) from the 
middle of the nineteenth century up to the end, and (III) twentieth 
century contemporary psychology. He deals with these developments 
in a clear, impartial, and detached manner and shows excellent judge- 
ment in including from the tremendous mass of material only such as is 
of significance in understanding the development of psychology. 

Modern psychology may be regarded as a number of controversies 
carried on by the different s¢hools. ‘The description of how these 
schools came to have their origin gives the reader an understanding 
of the significance of them, shows wherein they contain elements of 
truth and value, how they have altered, changed, developed and fused, 
and the manner in which they have contributed to the fund of psy- 
chological knowledge. This is tremendously valuable, for a discussion 
of (say) functionalism either favorably by a functionalist or satirically 
by a structuralist, makes difficult an appreciation of the valuable 
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work done by the school, and the extent to which functionalism enters 
into other branches of psychology (not excluding structuralism). 

One of the conclusions to which the book inevitably leads as to the 
reason for the welter of warring of psychological schools is clearly stated 
by Kluver. He says, ‘We begin with an exaggeration of certain ele- 
ments of experience, we stress certain features of the event dispro- 
portionately and efface others; we construct relations which on the basis 
of our knowledge seem possible.’”?’ Such must be the method—but it 
is forgotten by many that the value of work so done is dependent upon 
knowledge of the consequence of exaggeration, of omission and distor- 
tion, and of selecting the wrong fundaments in relation educing, and a 
willingness to admit the relativity of results obtained by so faulty 
but inevitable a method. Much of the present mischief appears from 
this history to be due to the unjustifiable dogmatic absolutism of 
psychologists past and present. 

Professor Boring in his presidential address to the American Psy- 
chological Association of 1928-1929, admitting the faultiness of available 
psychological methods, concluded that it was essential for the progress 
of the science that psychologists should be able to develop at least 
two phases of dissocation: (1) To fervently believe in their own school 
and their own theories, and (2) to stand off at a distance and with an 
unbiased comprehension of all theories view their own. Murphy’s 
work is a very timely aid to the psychologist or student who agrees with 
Professor Boring. 

This book is one of the most valuable books on psychology that has 
appeared for many years. It places in the hands of the student a 
means of obtaining without a tremendous amount of library research a 
very comprehensive and, though brief, not superficial understanding of 
the background to his research efforts. It is primarily valuable for the 
graduate research worker and for those specialists who find it difficult to 
see their own work in correct perspective. Out of a pedantic past, 
through a blatant quarrelsome present, it looks forward and ought 
considerably to help toward a more praiseworthy future for psychology. 

C. S. SLOCOMBE. 
Boston Elevated Railway. 





A Social Interpretation of Education, by Joseph K. Hart. New York: 
Henry Holt and Co., 1929. Pp. 478. 


No man has had a more varied experience with American education 
than Dr. Hart. He has run the gamut of teaching from the one-room 
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rural school to the great urban university. During his years on the 
staff of the Survey he explored the many problems of adult education 
thrust upon us by our changing civilization, and became intimately 
acquainted with new experiments in education both at home and abroad. 
Now, as dean of a university faculty of education, he formulates the 


philosophy that has grown out of his long and active educational 
experience. 


Dr. Hart has himself summarized the problem of American education 
as he sees it, more succinctly and pointedly than could the reviewer: 


The true educational agency is the community within which and by means of which 
the individual comes to whatever maturity he reaches. By and large, the qualities 
of that community will be reflected in its members, variously, of course, as they 
have various capacities for responding to its impacts, and as they touch various 
facets of its existence. The real problem of education, then, becomes that of 
making a community that shall be expressive of humanity, present and to come. 
The disorganization of our older communities was inevitable, desirable; but dis- 
integration, as an accepted state, is not inevitable, desirable. Its continuance is 
merely proof of our immaturity, or our inertia. 

Nor can we ask some fragment of the disorganized community, for example, the 
school, to take over the whole problem of integration and handle it, ab eztra, 
arrogantly, intellectualistically. We have complimented human nature by this 
proposal long enough. Human life is real. It is as real as corn and cattle, or 
any other living thing; and it must achieve its ends, however desirable, by proc- 
esses that do not violate its own reality. The problem of education is the problem 
of community-making, in the most fundamental sense of the term. The problem 
of the school is merely a chapter in that more inclusive problem. School is 
important. But an unrelated school—a school that is unacquainted with, or 
indifferent to, the world within which it is attempting to operate, the world from 
which its ‘‘ pupils’? come each morning and to which they must go back evenings— 
is an impertinence. A school that compels children to become ‘‘pupils” for some 
hours each day is in the long run an immoral institution. The vitalities of life are 
in communities, not in institutions! 

All the lesser, detailed problems set forth in these pages find their proper mean- 
ings within the configuration of this problem. Primitive educations that persist 
beyond their time; disorganizations that break up old integrations; new organiza- 
tions that merely drift together like fragments of a shipwreck in an ocean whirlpool; 
new hypotheses forecasting a more intelligent future; new materials; new methods; 
new institutional alignments; new philosophies; new psychologies; new logics— 
whatever the past or the present has uncovered finds its real significance within the 
configuration of this problem. All these factors give content to the problem. 
But the problem is more than the sum of these factors; the whole is always more 
than the sum of its parts. The problem of education is before us, and until men 
find complete release and complete integration in community life that is fully 
released and fully integrated that problem will be there. 
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The book is an eloquent and timely plea, in this day of idolatry of 
technique, for a school that is more closely integrated with the actual 
currents of the community life of which it is a part. 


Harvey ZorBAUGH. 
New York University. 





Incentives to Study: A Survey of Student Opinion, by Albert B. Crawford, 
New Haven: Yale University Press, 1929. Pp. XII + 188. 


Crawford’s Incentives to Study is a notable contribution, not only 
because of its scientific method, but because of the courage with which 
it states conclusions. No brief note can adequately summarize Craw- 
ford’s contribution, but one or two of his conclusions may be indicated 
for the special benefit of curriculum makers, whose efforts in the past 
have been directed too much to substituting one blanket prescription 
for another. 

“The fact that the secondary purpose underlying these factors 
exercises as much influence as it does, indicates that the curriculum, of 
itself alone, offers inadequate incentive to study. 

“This deficiency of the curriculum is largely due to lack of sufficient 
purpose discernible therein by the student. 

“Extension of the principle of distribution to the point where it 
represents meaningless compulsions put upon the student, often in 
conflict with his real academic interests, is an important contributory 
factor to this unsatisfactory situation.” 

The conclusions just quoted ought to help in rescuing curriculum 
makers from the fallacy of identifying the ‘paper curriculum” with the 
actual curriculum which is what students learn, either by the help or 
sometimes in spite of the curriculum requirements. 

Recent studies have emphasized the tremendous importance of the 
individual and the relatively lesser importance of courses of study, 
particularly those that are enforced by curriculum makers who assumed 
the role of sumptuary legislators. The constructive suggestion offered 
by Professor Crawford as given in his conclusions on page 125 is: 
“Arbitrary requirements and mass legislation affecting all students 
alike, unmindful of the differences between them, should be replaced 
by a course of study sufficiently flexible for adaptation to individual 
needs and aims, with emphasis on the purposeful relation of its parts to 
each other, and of the whole to the student’s life after graduation. 
Above all it should aim to capitalize intellectually his major interest. ”’ 

Ben D. Woop. 


Columbia University. 
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